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PREFACE 


Moat of the mathematical theory of atatlatlca In Its present state has been 

developed during the past twenty years. Because of the variety of scientific fields In 

which statistical problems have arisen, the original contributions to this branch of 

applied mathematics are widely scattered In scientific literature. Moat of the theory 

still exists only In original form. 

% 

During the past few years the author has conducted a two- semester course at 
Princeton University for advanced undergraduates and beginning graduate students In which 
an attempt has been made to give the students an Introduction to the more recent develop- 
ments In the mathematical theory of statistics. The subject matter for this course has 
been gleaned, for the moat part, from periodical literature. Since It Is Impossible to 
cover In detail any large portion of this literature In two semesters, the course has 
been held primarily to the basic mathematics of the material, with Just enou^ problems 
and examples for Illustrative and examination purposes. 

Except for Chapter XI, the contents of the present set of notes constitute the 
basic subject matter which this course was designed to cover. Some of the material in 
the author’s Statistical Inference (1937) has been revised and Included. In writing up 
the notes an attempt has been made to be as brief and concise as possible and to keep to 
the mathematics with a minimum of excursions Into applied mathematical statistics problems. 

An Important topic which has been omitted is that of characteristic functions of 
random variables, which, when used in Fourier Inversions, provide a direct ana powerful 
method of determining certain seunpllng distributions and other random variable distribu- 
tions. However, moment generating fimctlons are used; they are more easily understood by 
students at this level and are almost as useful as characteristic functions as far as 
actual applications to mathematical statistics are concerned. Many specialized topics are 
omitted, such as Intraclass, tetrachorlc and other specialized correlation problems, 
seml-lnvarleints, renewal theory, the Behrens -Fisher problem, special transformations of 
population parameters and random variables, sampling from Poisson populations, etc. It Is 
the experience of the author that an effective way for handling many of these specialized 
topics la to formulate them as problems for the students. If and when the present notes 
are revised and Issued In permanent form, such problems will be Inserted at the ends of 
sections and chapters. In the meantime, criticisms, suggestions, and notices of errors 
will be gratefully received from readers. 

Finally, the author wishes to express his indebtedness to Dr. Henry Scheffe, 

Mr. T. W. Anderson, Jr. and Mr. D. F. Votaw, Jr. for their generous assistance In pre- 
I)arlng these notes. Moat of the sections in the first seven chapters and several sections 
In ChapteKX and XI were prepared by these men, particularly the first two. Thanks are 
due Mrs. W. M. Weber for her painstaking preparation of the msinuscrlpt for lithoprinting. 

S. S. Wilks. 

Princeton, New Jersey 
April, 19 ^ 3 . 
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CHAPTER I 


INTRODUCTION 

Modem statistical methodology may be conveniently divided into two broad 
classes. To one of these classes belongs the routine collection, tabulation, and des- 
cription of large masses of data per ae, most of the work being reduced to hl^ speed 
mechanized procedures. Here elementary mathematical methods such as percentaging, 
averaging, graphing, etc. are used for condensing and describing the data m ^ Is . 

To the other class belongs a methodology which has been developed for making predict lozis 
or drawing inferences, from a given set or sample of observations about a larger set or 
population of i)otentlal observations. In this type of methodology, we find the mathe- 
matical methods more advanced, with the theory of probability playing the fundamental 
role. In this course, we shall be concerned with the mathematics of this second class 
of methodology. It la natural that these mathematical methods should embody assumptions 
and operations of a purely mathematical character which correspond to properties and 
operations relating to the actual observations. The test of the applicability of the 
mathematics in this field as in any other branch of applied mathematics, consists In 
comparing the predictions as calculated from the mathematical model with what actually 
happens experimentally. 

Since probability theory la fundamental in this branch of matheme^tlcs, we 
should examine Informally at this point some notions which at least suggest a way of 
setting up a probability theory. As far as the present discussion is concerned, perhaps 
the best approach is to examine a few simple enq)irical situations and see how we would 
proceed to idealize and to set up a theory. Suppose a die is thrown successively. If 
we denote by X the nvunber of dots appearing on the upper face of the die, then X will 
take on one of the values l, 2, 3, *♦, 5, 6 at each throw. The variable X jumps from 


•For an example of such a comparison, see Ch. 5 of Bortklewlcz' Die Iterationen . Springer, 
Berlin, 1917- 
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value to value as the die la thrown aucceasively, thua yielding a aequex^ce of numbera 
whloh appear to be quite haphazard or erratic in the order in which they occur. A aim- 
ilar aituation holda in toaaing a coin aucceaaively where X ia the number of heada in a 
alngle toaa. In thia case a aucceaaion of toaaea will yield a hai^iazard aequence of 
0*a and 1 'a. Similarly, if X ia the blowing time in aeconda of a fuse made under a 
given set of apecificatlona, then a sequence, let us say of every fuse from a pro- 
duction line will yield a aecfuence of numbera (values of X) which will have this char- 
acteristic of* haphazardneaa or randomness if there la nothing in the manufacturing oper- 
ations which will cause ’’peculiarities” in the sequence, such as excessive high or low 
values, long runa of hig^ or low values, etc. We make no attempt to define randomneaa 
in observed sequences, except to describe it roughly as the erratic character of the 
fluctuations usually found in sequences of measurements on operations repeatedly per- 
formed under "essentially the same circumstances”, as for example successively throwing 
dice, tossing coins, drawing chips from a bowl, etc. In operations such as taking 
fuses from a production line and making some measurement on each fuse (e. g. blowing 
time) the. resulting sequence of measurement^ frequently has "peculiarities” of the kind 
mentioned above, thus lacking the characteristic of randomness. However, it has been 
found that frequently a state of randomness similar to that produced by rolling dice, 
drawing chips from a bowl, etc., can be obtained in such a process as mass production 
by carefully controlling the production procedure.* 

Now let us see what features of these empirical sequences which arise from 
"randomizing processes can be abstracted into a mathematical theory- -probability 
theory . If we take the first n numbers in an empirical sequence of numbera , Xg, 

Xj, . . . , . . . . , there will be a certain fraction of them, say Pj^(x), less 

than or equal to x, no matter what value of x is tafien. For each value of x, . 

We shall refet* to as the empirical cumulative distribution function of the numbera 

•• • • • A® ^ increases, Fj^(x) will either Increase or remain constant. 

It^ls a matter of experience that as n becomes larger and larger F^(x) becomes more 
and more stable, appearing to approach some limit, say Pqq(^) f’or each value of x. 

^Shewhart has developed a statistical jnethod of quality control in mass production engin- 
eering which is esfi^entlally a practical empirical procedure for approximating a state of 
randomness ( statisticcCl control . to use Shewhart’s term) for a given measurement in a 
sequence of Articles from a production line, by successively Identifying and eliminating 
causes of peculiarities in the sequence back in the materials and manufacturing oper- 
ations. 
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If ally subsequence of the original sequence Is chosen "at random" (l.e. according to any rule 
idilch does not depend on the values of the X* 8 ) then a corresponding can he defined 

for the subsequence, and again we know from experience that as n Increases, Pj^(x) for the 
subsequence appears to approach the same limit for each value of x as in the original 
sequence • 

Entirely similar experimental evidence exists for situations in which the em- 
pirical sequences are sequences of pairs, triples, or sets •of k numbers, rather than 
sequences of single numbers. For example, a sequence of throws of pairs of dice would 
give rise to a sequence of pairs of numbers; the resistance, capacity, and inductance of 
each relay in a sequence of telephone relays from a carefully controlled production line 
would yield a sequence of triples of measurements. In considering a random sequence of 
pairs of numbers X^,), (X,2» Xg^), • • • > (X^^, X^j^), ...» we can let P^(x,,X2) 

be the proportion of pairs in the first n pairs in which the value of is less than or 
equal to x^ and the value of X^ is less than or equal to x^. We need not list all of the 
properties of x^), for they are straightforward extensions of those of con- 

sldered above. The important point here is that as n Increases, experience indicates 
that Xg) appears to approach some limit Pqq(x^, x^) for each value of x^ and of Xg. 

In particular, suppose we group the numbers of an empirical random sequence 
X^ , Xg, . . . , . . . (with empirical cumulative distribution function F^(x)) into 

pairs (or samples of two numbers), so as to make a new sequence of pairs of numbers 
(X^, Xg), (X^, X|^), • . . , Xgj^), .... As before, we have an empirical cumu- 

lative distribution function P^(x^, Xg) for this sequence of pairs. It is an experimental 
fact that as n becomes larger and larger, Pj^(x^ , Xg) behaves more and more nearly like the 
product Pj^(x^ ) Pj^(Xg). A similar situation is true for sequences of sample^ i of three or 
more numbers. As we shall see later, ‘it is this product property that suggests a 
way to set up a mathematical theory of sampling. r 

The matter of Pj^(x) appearing to approach some function (x) as n Increases is 
purely an empirical jdienomenon, and not a mathematical one, but it suggests a way of set- 
ting up a mathematical model corresponding to any randomizing process which, upon repeated 
application will yield an empirical sequence of ijumbers. We postulate the existence of a 
function F(x)^(the properties of this function are given in § 2 .ii) to serve as a mathe- 
matlcal model for Fq^(x). In some situations such as coin tossing, dice throwing, etc., 
a coii5)lete numerical specification of P(x) can be proposed by combinatorial and other a 
priori considerations. In other situations of a more purely statistical nature It may be 
Impossible to specify P(X) beyond a particular functional form Involving certain parameters. 
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In attempting to relate the behavior of the empirical cumulative distribution 
function F^(x) to the mathematical abstraction P(x) one encounters at least two diffi- 
culties: One is common to all mathematical theories of physical (chemical, biological, 
sociological) phenomena, employing limits: the mathematical process of passing through 
an infinite number of steps is physically unrealizable, and is often impossible even as 
a ** thought -experiment". For example, let the reader consider the notion of mass or 
charge density in the ll^t of the fact that mass and charge are discrete. The other 
difficulty is peculiar to probability theory in that the theory does not assert that 

= F(x), but that the approach"^ is in a sense defined within the framework of the 
theory Itself: converges stochastically to F(x). Stochastic convergence is de- 
fined in . 

Once F(x) has been postulated, the mathematics begins and it consists of carrying 
out various mathematical manipulations on P(x) corresponding to certain operations which 
can be performed on the sequence produced by the given randomizing process. The mathe- 
matics then becomes a method of making predictions of what will happen if certain opera- 
tions are applied to the sequence. For example, P(b)-P(a) is a prediction of the pro- 
portion of times, in a large number of trials, that the given process will yield numbers 

+(D 

greater than a and less than or equal to b; C x dP(x) (taken in the Stleltjes sense, 

-CD 

§ 2 . 5 ) is a prediction of the average of numbers obtained in a long series of repeated 
applications of the process; F(x^ )*P(x2) is a prediction of the proportion of samples of 
pairs of numbers, out of a large number of such pairs, in which the first number la 

and the second ^x„; ^ dF(x, )dF(x2), where R la the region In the x^x plane for 
1 ^ 

Which A^-(x^+x^)^B, is a prediction of the proportion of samples of pairs of numbers, 
out of a large number of such pairs, in which the average of the sample pair lies be- 
tween A and B. Many other examples could be given here but these will perhaps illus- 
trate the nature of the correspondence between the mathematical operations performed on 
P(x) (i. e. probability theory) and calculations based on the results of repeated appli- 
cations of a given randomizing process. The degree of correspondence, i. e. validity of 
prediction, depends on the degree of randomness in the empirical sequence and on how 
well the function F(x) has been chosen. That such predictions, correctly applied, have 
practical validity has been experimentally verified many times. ^ 

See a study by V. I. Smirnoff, "Sur les ecarts de la courbe de distribution empirique", 
Recueil Mathematioue . Moscow, vol. o (1939), pp. 25-26. 
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DISTRIBUTION FUNCTIONS 

In this chapter we outline the basic probability theory necessary for the work 
of the course. The treatment is general, the study of important particular distributions 
being postponed to the next chapter. 

g.i Cumulative Distribution Functions 

In the previous chapter we have Introduced the notion of an empirical cumula- 
tive distribution function (c. d. f.)P^(x), and have indicated that it is an experimental 
fact that appears to approach a limiting form Fqq(x) as n is Increased. We now de- 

fine a mathematical model P(x) for %he intuitively apprehended ^(x) by laying down 
postulates for distribution functions. Henceforth the term cumulative distribution 
function (c. d. f.) will be used only in the sense defined below. 

We shall find it convenient to use the following notations and definitions 
from point set theory: P€E signifies that the point P belongs to the set E. is 

read *’E^ contains E^". The siom (or union) of E^ and E^ is the totality^of points P for 
which P€E^ or E^; we shall denote it by E^ + E^. The product (or intersection) of E^ and 
Eg is the totality of points P for which P€ both E^ and E^; we write it E^Eg. E^ and Eg 
are said to be disjoint if they have no points in common. The difference E^ - Eg is the 
totality of points in E^ not in Eg. 

2.11 Univariate Case 

A c. d. f. P(x) is defined by the following postulates: 

1) If x*<x” , then P(x** ) - P(x*) ^o. 

2) P(-OO) « 0, P(-fOO) = 1. 

The notation in (2) Implies that the limits of P(x) exist as x — * - ooor + 00 . Since (1 ) 
means that P(j^ la monotone, it follows that P(x) has at most an enumerable number of 
discontinuities, and that the limits P(x+o), P(x-o) exist everywhere. The determination 
of the values of P(x) at its discontinuities is really not essential, but it will be con- 
venlent to fix them by 


5) P(x+0) * P(x). 
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It follows from (1) and (2) that P(x) la non-negative. 

The relation between probability statements about a random variable* X and Its 
c. d. f. Is determined by the following further postulates: 

1 ' ) Pr (X^x) - P(x). 

The left member Is read "the probability that X ^ x ." Let E, , Eg, , be a finite or 

enumerable number of dls .joint point sets on the x-axls : 

2') Pr(X€E,+E 2 +... ) = Pr(XeE, ) + PrjXCEg) + ... This may be called the law of 

complete additivity, and may be used to determine the term on the left side of the equa- 
tion, or any term on the right, when all the other probabilities entering the equation 
are known. Por example, let I be the Interval x' < x ^ x” , I' be the Interval - od < x 
^ X', I" be the Intei^al - oo < x ^ x". Then 


Prom ( 1 ' ) 


I" = I' + I. 

Pr(X€I') = F{x'), Pr(Xei") - P{x"), 


and hence from ( 2 ') we may state the theorem 


A) Pr(x'<x^x") - P(x") - P(x'). 


In order to find the probability that X be equal to a given value x ' take 
points a^<a 2 <aj<... converging to x'. Let I be the Interval a,<x^x', and 
val aj<x^aj^^ . -Then 


I •= X' 


+ 


z 


I 




a sequence of 

I. be the Inter 
j 


Hence from ( 2 '), 


Pr(X€I) 


Pr(X<=x' ) 



Pr(Xeij), 


and from theorem (A), 

P(x') - F(a, ) 


00 

Pr(X-x') + Z [F(a,^, 
j-1 


)-P(aj)]. 


♦In this chapter it Is convenient to denote a random variable by a capltalij^etter, X, 
etc., and the corresponding Independent variable in the distribution function by the 
corresponding lower case letter, x, etc. In later chapters we will drop this convention 
when there is no danger of confusion. 
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Now 


SSL ^ 

> (P(a. )-P(a,)] - llin^tP(a. , )-P(aJ] - 11m [P(a_ )-P{a, ) ] - P(x*-0) - F(a, ). 

JTi J-*-’ J n-oo j-l J n-oo ’ 


Hence we have the theorem 


B) Pr(X-x) - P(x) - P(x-o). 

In a elmllar manner one may derive the following theorems: 


C) Pr(x'<X<x'') = P(x"-o) - P(x'), 

Pr(x'^X^x") = P(x") - P(x'-o), 

Pr(x'^X<x") - P(x"-o) - P(x'-O). 

D) 0 ^ Pr{XeE) (for any set E for which the middle 

member Is defined). 

E) Pr(- 00 < X < + oo) = 1 . 


Let E^jEg,...., be sets which are not necessarily disjoint, then 


P) 


Pr(X€E,+E2) = 

Pr(X€E,4E2+E5) 


Pr{X€Ep + PrtXeE^) - Pr(X€E,E2), 

= ^ Pr(XeEj) - ^ Pr{X€Ej^Ej) + PrCXCE^EgEj)^ etc. 


We now characterize two Important classes of c, d, f.*s: 

( 1 ) Suppose that F(x) increases only by jumps, -- more precisely, suppose a 
finite or an enumerable set of points x^, x^,,..., and corresponding positive numbers p^ , 
Pg^*..; IIp^=l, such that P(x) = Zpj summed over all j for which Xj^x. We shall call 

•n 

this the discrete case. It may be shown from the theorem (B) that in this case 
Pr(X-x^) » p^, while for any point x* ^ any x^, Pr(X=x’) = o. 

If the number of x^ is finite, or more generally, if the have no cluster 
points except + oo, then the graph of P(x) in this case is a step-function made up of 
horlzon 1 ;al lines as shown in (a) of Figure i . The jump at x = x^. is equal to p^, the 


♦It should be noted that an empirical c. d. f. Fj^(x) of an observation variable X has pro- 
perties (1), (2), (3) of ac. d. f. (discrete case), does not have properties (1*) 

and (2*), althou^ it has analogous properties. That is, corresponding to (1*) we would 
have Prop(X^x) « (proportion of values of X^x is Pj^(x)) and for (2’) we would have 

Prop(X6E^+E2+. . .+E^) « Prop (X€E^ ) + Prop (XeE^) +...+ Prop (X6Ej^). Thus, in the case of 
Pj^(x), P^ would be the proportion of cases among the n values of X, in which the observa- 
tion variable « x^ and not probability that X = x^. 
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probability that X = 

(11) Another case la characterized by the existence of a function f(x) ^ o 


auch that 

X 

P(x) - ] f(e) dj. 

-OD 


This. Is really a necessary and sufficient condition for the absolute continuity of F(x), 
but Instead of calling this the absolutely continuous case, we shall refer to It merely 
as the continuous case. Tlie graph of F(x) In this case Is continuous as shown In (b) of 
Figure 1 . We shall call f(x) the probability density function of the random variable X. 
The reader may show that In this case 

X- 

Pr(x'^X^x") =5 f(«) d5, 

X* 

and that the statement remains valid If one or both of the equality signs inside the par- 
entheses on the left are deleted. If f(x) is continuous for x' < x < x**, 

Pr(x*^X^x’*) - (x"-x») f(x^), where x' < x^ < x", 

and If f(x) Is continuous at x^, 

Pr(Xc^X^XQ+dx) = f(x^) dx, 

except for Infinitesimals of higher order. The Infinitesimal f(x)dx Is sometimes called 
the probability element of X. 

- The discrete and continuous cases thus defined obviously do not cover all uni- 
variate c. d. f.'s, but we shall confine ourselves to these In the present course. 

P(x) P(x) 



Let J be a rectangle In the x, , x^ plane, xj < ^ x^, x^ < Xg x|. Denote 

by Aj P(x^, Xg) the second difference 

A§ P(x,,Xg) - P(x-,x2) + P(x{,x^) - P(x{,xp - P(x-,x^). 


Then a c. d. f. P(x^,X 2 ) la subjected to the following postulates: 

1 ) Aj F(x^,x^) 2 0. 

2 ) P(-cqx 2) “ P(x^ ,-00 ) » 0, P(+oo,+oo ) » 1 . 

By letting x ^* — ^ -oo in (i), we get with the aid of (2), 

F(x^,x^) - P(x^,x^) ^ 0 If x^ > x^, 

and similarly 

F(x^^,x^) - P(x»,X2) ^ 0 if X” > x», 

so that P(x^,X 2 ) la monotonlc In each variable separately. Hence the limits P(x^+0,Xg), 
P(x^,X 2 ±o) exist everywhere. It can be shown that F(x^,X 2 ) la discontinuous In x^ at 
worst on an enumerable number of lines x^ » constant, and similarly for x^. If we let 
x|— ► - 00 and x^ — oo in (i ), we get F(x^,X 2 ) ^ 0 because of (2). The values of 
P(x^,X 2 ) at Its discontinuities are fixed by 

5) P(x^,X2) = F(x^-». 0,X2) = P(x^,X2+0.). 

The tleup of probability statements about a vector random variable with 

two components with its c. d. f. is determined by the following further postulates: 

1') Pr(X,^x, = P(x,,X2). 

Let , Eg,...., be disjoint sets, then 

2*) Pr(X^,Xg€E^+E2+...) = Pr(X,,Xg€E^) + Pr(X^^Xg€Eg) + ... 

By methods of §2.11 the reader may verify the following theorems: 

A) Pr(X,,X2€J) = a5f(x,,X 2), 
where J and Aj are defined above. 

B) Pr(x,'<X,^xy,X 2 =X 2 ) = P(x;',Xg) + P(x,',X 2 - 0 ) - P(x,',Xg) - P(x,",Xg-o). 

C) Pr(X,=x, ,Xg=Xg) - P(X,,Xg) + P(x,-o,X2-0) - PCx^-o.x^) - P(x,,X 2 -o). 

It can be shown by methods beyond the level of this course that from the postulates (1*), 
(2*) the probability Uv.ot ,X^(E Is determined for o, very general class of regions 
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called Borel-raeasurable^ regions. 


D) PrCX^^XgCE) ^ 1. 

E) Pr(-aD <X^<+oo ,-cd<X2<+oo ) « 1 . 
For sets , E^,...., not necessarily disjoint, 

F) Theorem (F) of §2.11 is valid 


With a bivariate distribution function we shall be mainly Interested in the 
discrete case and the continuous case, and occasionally a mixed case, all defined below. 

We remark again that these categories are not exhaustive. 

I ) The discrete case is characterized by the existence of a finite or enum- 
erable set of points 1* ** 1,2,..., and associated positive numbers p^^ (probabilities ) 

F(x^,X 2) “^Pj summed for all j for which x^j ^ x^ and x^j ^ x^. Prom 
theorem (C) It follows that Pr(X^*x^^,X2«X2^) * and for any point (x*,XjJ) not In the 
set Pr(X^-x»,X2-x^) * 0. 

II ) By the continuous case (see remarks In §2.11 about absolute continuity) 

' 0 shall understand that In which there exists a function f(x^,X2) ^ o such that 

Xi x^ 

F[x^.x^) = j ^ f(£,,E2) 


(a) 


-00 -00 


We may show that 


Pr(X, ,X2€J) = ^5 f(x,,X2) dx^dbCg, 
J 


*In k- dimensional apace a Borel - measurable region (or a Borel set ) la one that la obtain- 
able from half-open Intervals or cells, ^ xj, 1 - 1, 2,...,k, by taking a finite 

or enumerable number of sums, differences, and products of such cells. A function f(x) 
la Borel-meaaurable If the set of values of x for which a < f{x) ^ b la a Borel set, 
where a and bare any two real nxmibera . A Borel-meaaurable function of two or more 
variables is similarly defined. 

**As In the case of one variable. It should be observed that an empirical c'. d. f . 
Pn(x,,X2) of two observation variables X,, X^ has properties ( 1 ), (2), (3) of a c. d. f. 
for two random variables (discrete case). But, In (T) and (2') one would use the term 
"proportion of cases" Instead of the term "probability of". The pj^ associated with the 
Isolated points would be called the proportion of cases for which X, - x . , 

^2 “ *21* iristead of the probability that X^ = x,j^, Xj, =■ x^^. The number of such points 
would be ^ n, the number of observed pairs of values of X, , Xg . 

ITils comparison of an empirical c. d. f. and the case of discrete variables 
extends at once to the case of k variables discussed In §2.13, 
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and that the result la not Invalidated If J la closed by the addition of Ita boundaries. 
Prom this It follows that, except for Infinitesimals of higher order, 

Pr(x^^X^^x^-»-dx^ ^Xg^X^Xg+dXg) - fCx^^x^) dx^dx^. 

f(x^,X2) and f (x^ ,X2)dx^dx2 are called respectively the p. d. f* and the probability 
element of the random variables X^ , X^ . 

Hi) The mixed * *case (X^ continuous, X^ discrete) is said to obtain if there 
exists a finite or enumerable set of lines x^ - i = i, 2 ,..., associated positive 

numbers Pgj^ “ and a non-negative function of x^ and x^ defined for all x^ and 

for Xg * ^ 21 ' ^ 2 ,..., which function we shall write as fCx^lx^j^), such that 


TVAy 

J f(?,IX2j)d5, - 1, 


- r 1 

P(x^,X2) =«Xp2j\ f IXg j )d 5 ^ , summed over all j for which x^j ^ x^. 

-00 

In the mixed case p^^ is the probability that the random point X^ , X^ will fall on the 
line Xg - Xg^, and f(x^(x2)dx^ is the probability (to within terms of order dx^ ) that 
x^<X^<x^+dx^ if the random point falls on the line Xg = Xg^. 

It may be shown from our postulates that for any (B-meas.) region E in the 
x^Xg-plane we get in the three cases 

t i) Xpj summed over all j such that x^j,Xgj€E, 

11) nyl'(X,,X2)(lX,CbC2, 

111 ) "> 5 f (x^ IXgj^ )dx, , where Is the projection 

^ ^ Ej^ on the x^ axis of the part of 

the line Xg«Xg^ lying in E. 

( If the line does not inter- 
sect E, the corresponding in- 
tegral is zero.) 


By means of the Stleltjes Integral (§2.5) these three cases may be brought under the 

single expression Pr(X^,X2€E) = J dF(x^ ,Xg), which includes indeed the most general case. 

E 

2.13 k-Variate Case 

A k-varlate c. d. f. F(x^ yXg, . . . ,Xj^) must satisfy the following three postu- 
lates: Let J be the k-dimensional cell x|<x^^xj, l»i, 2 ,..»k, and define the k-th 
difference 

♦probability density function 

♦♦The reader will understand this case better if he rereads this description after having 
mastered § 2 .U. 
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«J 

where the operators are applied successively and denote 

4 iP(x,,Xg,...,x^) - F(x,,..,,Xi_,,xJ,Xi^,,...,x^) 

^ - P(X,,...,X^.l,Xi.X^^,,...,Xi,). 

1 ^ ^ • 

2) P(-00 jXg, . . . ,Xj^) •» P(X^,-CX>, Xj,...,Xjj) = ... 

- P(x,,...,x^_, ,-00 ) « 0, F( +00 ,+00 , . • . ,+00 ) « 1. 

3 ) P(X^,...,X^_,,Xj^,Xj^^^,...,Xj^) = P(X,,...,X^_,,Xj+0, Xj^^^,...,Xjj.). 

As In the bivariate case It can be shown from ( 1 ) and ( 2 ) that P is monotonlc In each 
variable separately and that P Is monotonlc (In the sense of (i )) in any set of variables 
If the remainder are held fixed. 

A random vector variable X - (X^ ,X2, . . . ,X^ ) is said to have the c. d. f. 

F(x^ iXgj . . . ,Xj^), —or the random variables X^ ,X2, . . . ,Xj^ are said to be jointly distri- 
buted with the c. d. f . — If furthermore 

1 * ) Pr ( X^ ^x ^ ,X2^Xg , • • • ) *" ) • 

If E^,Eg,..., are a finite or enumerable number of disjoint sets, 

2 » ) Pr(X€E^ +E2+. . . ) » Pr(X€E^ ) + PrCXCE^ ) + . . . 

By the ^methods used before we may now generalize the theorems (A) to (P) of §§2. 11 and 
2.12. 

The discrete case and the. continuous are defined by obvious generalization of 
52.12, and It Is evident how mixed cases of various orders would now be defined. 

2.2 Marginal Distributions 

Suppose the joint c. d. f. of the random variables X^ ,Xg Is P(x^,X2), and con- 
sider the probability that X^ ^ x^ , without any condition on Xgi 

Pr(X^^xp » Pr(X^^x^ ,Xg<+oo ) - P(x^,+oo). 

This Is called the marginal distribution of X^ • We note that It Is a bona fide distribu- 
tion function as defined in §2.11, In fact. It is the univariate c. d. f. of X^ . Simi- 
larly, we define P(+oo,X2) as the marginal distribution of Xg. 

Por the discrete case defined In 52.12 we then have 

P(x^ ,+00 ) « Hpj summed for all j such that x^ j ^ x^ . 
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For the continuous case, 

(a) P(x, ,+oo) 

where 



00 

f(J,.?2)dEiCUr2 

00 



+00 

f,(x,) - ^f(x,,5)d5. 

-00 

t 

f ^ (x, ) may be called the marginal p. d. f . of . 

In the trlvarlate case we get besides the marginal distribution of each random 
variable separately, for example. 


P(x, ,+QD ,+00 ) - Pr(X^^x^ ), 

also marginal distributions of pairs of random variables, for example, 

P(x^,X2,+oo) « Pr(X,^x,,Xg^Xg). 

For a k-varlate distribution one likewise defines marginal distributions of the random 
variables taken one at a time. In pairs,..., k-1 at a time. We note that all these n»r- 
glnal distributions satisfy the postulates (1), (2), (5),{1'), (s') for a c, d. f. 

2.3 Statistical Independence 

If PCx^.Xg) is the c. d. f. of X, .Xg, then from §2.2, 

P^(X, ) = F(X,,+CD ), PgCXg) - P(+00,X2) 

are the marginal dlatributlona of and Xg, reapectlvely. We say that the random vari- 
ables X^ yXg are Independent In the probability sense , or statistically Independent , If 

(a) P(X,,X2) - P, (X, ) P2 (X 2 ). 

It la easily seen that a necessary and sufficient condition for the statistical Independ- 
ence of X, and Xg Is that their joint c. d. f. factor Into a function of x^ alone times 
a fimctlon of Xg alone, 1. e,, 

P(x, ,X 2 ) - G(x, ) WXg). 

In order to see the probability Implications of statistical Independence, con- 
sider any two Intervals I^ and Ig on the x^ and Xg-axes, respectively, 

I,: X' < X, ^ 

Ig: x| < Xg ^ X”, 





^k 


ii» DiamiBuriQN FDNcriQMa 


82>3 > 


and let J be the rectangle of points (x^^x^) satisfying both these Inequalities. Then 

(b) ?r(X^,X^€J) - Pr(X,€I, )-Pr(X2€l2). 

For, by hypothesis, we have (a); hence 
2 

Pr(X^,X2€J) « LjF{x^,x^) « F^{x!; )F^(x^) + Pi(x|)P2(x^) - F^(x\)F^(x^) - F^{x^)F^{x^) . 

After factoring the rl^t member we easily get (b). 

By the same method, and with the aid of Theorem (B) of 52. 11 and Theorem (C) 
of 52.12 we get that If X^ and X^ are statistically Independent, then 


Pr(X^«x^,X2-X2) « Pr(Xi-x^)-Pr(X2«X2). 

This Is of Importance for the discrete case. For the continuous case we may state the 
following result: If f(x^,X 2 ) Is the joint p. d. f. of X^,X 2 , If is the marginal 

p. d. f. of X^, j •» 1,2, and If X^,X^ are statistically Independent, then 

f(x^,X2) - fg (Xg) 

wherever f(x^,X 2 ) Is continuous. At the points of continuity, we have from equation (a) 
of §2.12, 

ap,{x^) aP2(Xg) 

“ a5c^ 5x^ 

= (X, )f2(Xg), 

the last step following from (a) of §2.2. 

k random variables are said to be mutually (statistically) independent if their 
Joint c. d. f. la of the form 


P(x,,X2 


,Xv) 


P,(X, )P2(X2)...Pij(Xjj.), 


idiere l-o the marginal distribution of Xj. Two random vector variables Xj^ - 

(Xj^, ,X^g,...,Xjjj^^, 1 - 1,2, are called atatiatlcally Independent if the joint c. d. f. 
of the + kg components is the product of the marginal distributions of X^ and Xgt 


P(x 


11 


,...,Xik ;x2i 


"2k, 


) - P(X 


11 • 


.,Xij^^;+oo 


The definition of the statistical Independence of n vector random variables Is made as 
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the obvious generalization. 

The concept of statistical Independence Is fundamental In sampling theory ( 
n random variables are said to constitute a random sample from a population <$4.1 ) with 
c. d. f. P(x) If their joint c. d. f. la P(x, jPCXg). . .P(Xj^). If the population distribu- 
tion la k-varlate with c. d. f. F(x^,X 2 ,...>Xjj.), then the n vector variables 
% " (^11 1=1, 2,...,n, are said to be a random sample If the joint 

c. d. f. of the set IX. I la 

J Ot 

n 

^ * 

2.4 Conditional Probability 

Let X be a random variable, and let R be any (Borel) set of points on the 
x-axls. Let E be any (Borel) subset of R. If Pr(X€R) ^ 0, we define the conditional 
probability Pr(X€E|X€R), read •*the probability that X Is In E, given that X Is In R", as 

(a) Pr(X€E|X€R) = 

The definition (a) extends Immediately to any finite number of random variables. For 
example for two random variables X^ , Xg, R would represent a (Borel) set In the x^Xg 
plane and E would be a subset of R. 

Of particular interest Is the case In which R Is a set In the x^Xg plane for 
which X^€E^ where E^ Is any (Borel) set In the domain of X^ , and E is the product or Inter 
section set between R and a similar set for which X 2 €Eg, where Ep Is any (Borel) set In 
the domain of Xg. Here we may write E = E^Eg. The simplest case Is that In which E^ is 
an Interval x| < ^ x^" and Eg Is an Interval < Xg ^ x^. Then R Is the horizontal 

strip x^ < Xg ^ x|, and E is the rectangle for which xj < x^ ^ x^ and < Xg ^ x^. In 
the present case, expression (a) may be written In the form 

Pr(X.,Xg€E) 

(b) ■ 

Because of symmetry, we may also write 

Pr(X,,X-€E) 

Pr(X,£E,IX,€E, ) - . 


In a similar manner we may write for the case of three variables 


Pr(X5€EjlX,,X2€E,E2) 


Pr(X^,X2,X3€E,E2E3) 


and so on for any number of variables. The relation (b) may of course be expressed In 
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temfl of distribution functions. In particular, If have a bivariate p. d. f. 

f(x^,Xg), and E^ls the set 


(c) 


x« ix^ i X- 


on the x^-axls, and Is the set 


(d) 


^ Xg ^ x| + h 


on the Xg-axls, then E Is the rectangle In the x^Xg-plane defined by (c) and (d). Eqxxa- 
tlon (b) becomes 

x^+h x!j 

\ \ f(x, ,X2)dx,dXg 


(e) 


Pr(x^'^X,^x;|x^X2^x^+h) 


T * X * 


xpr 


^ fg(X2)dX2 


^2 


If the denominator does rpt vanish. If fCx^^Xg) la continuous In the rectangle E, the 
denominator may be written 


h fgCSg), where < 5g < + h, 

and the numerator, 

x^ 

^ h f(x, ,^g(x, ))dx, , idiere x^ < i) 2 (x, ) < x^ + h. 

(e) may then be written 

(f) J tr(x,,r^(x,))/fg(Eg)]dXi. 

We note that the Integrand, for fixed x^ and h, has the properties of a univariate p. d. f . 
We next assume that ^ Noting that Pr(x{^X^;^xy iX^-xp la not defined by (b), we 

now define It as the limit of (e) as h — > o. The continuity we have already assumed Is 
sufficient to justify our taking limits under the Integral sign In (f); the result la 

xy 

Pr(x»^X^^x 71 X 2 -X 2 ) - \ f(xjx2)dxi, 

x| 

where 


(g) 


f(X, IXg) - f(X,,Xg)/fg(Xg). 
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For fljced x^, f(Xf IXg) again has the properties of a univariate p. d. f.; It may be 
called the conditional p. d. f. of x^ , given Xg. We note that If X, and.Xg are statisti- 
cally Independent, f(x^lXg) - f, (x^). 

Likewise, If random variables X^^,...,X^^ , . . . have a Joint p. d. f. 

f(x,,,...,x Ik '^ai''’''^2k ^ define the conditional p. d. f. 

f(Xl1»««»/*^k^ »^21 * • * • '^2lc ) 

(h) f(X,,,...,X^j^ IXg,,...,X2jj ) - ^00+00 ’ 

f...! f(x^,,...,xiik,;Xg,,...-,X2jj )dx,, ...dx^j^ 

-00 -00 ® ’ 

if the denominator la not zero. 

2.5 The Stlelt.jea Integral 

An Important tool In mathematical atatlstlca, which often permits a comnon 
treatment of the discrete and continuous cases (and Indeed the most general case). Is the 
Stleltjes Integral. 

g . 5 1 Univariate Case 

We begin by defining the Stleltjes Integral over a finite half-open Interval 
a < X ^ b: Suppose we have two fvinctlona, 4>(x) continuous for a ^ x ^ b, and F(x) mono- 
tone for a ^ X ^ b. We subdivide (a,b) Into sublntervala Ij:(Xj_^ ,Xj) .by means of points 
Xq = a < x^ < Xg < . . . < Xjjj = b. In each Interval we pick an arbitrary point 5j€Ij. 
Denote by ^F(x) the difference F(Xj) - P(Xj_^), and form the sum 

S =^.4>(5 j ) <^iP(x). 

j»i J ^ 

If Uj Is the maximum of b(x) In Ij, and Lj, the minimum, then 

Lj i i>(5j) i Uj, 

(a) 

where 


^^LjAjP(x), 


Let € - max(U,-L,). Then 

J J J 


111 m 

0 ^ Su - 3l = ^^(Uj-Lj)AjP(x) i € 


€tP(b)-P(a)]. 
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Hence If the intervals Ij are further subdivided, and this process is continued in such a 
way that the norm of the subdivision , 6 - max(Xj-Xj_^ ), approaches zero, then since <)(x) 
is uniformly continuous on (a,b), € — » 0 , and hence 


3u - 3 l-» 0. 

It l 3 easily seen that Sr la non-decreasing, and non- increasing, as the subdivision 

is made finer, a^d hence from (a), S approaches a limit. Since and are Independent 

of the choice of the arbitrary point in therefore from (a), 11 m 3 is likewise In- 

J J 6-»0 

dependent of this choice. Furthermore, 11m S may be shown to be Independent of the 

d-*o 

method of subdivision. We call this limit the Stleltjes Integral of 4)(x) with respect 
to P(x) over the range a < x ^ b and denote it by 

b 

\ 6 (x)dP(x) = 11m S. 

i 0^0 

Let us examine further the significance of the Stleltjes Integral when P(x) is 
a c. d. f. in the discrete or continuous cases: Suppose that P(x) la a discrete c, d. f. 
with only a finite number n of jumps of amount at the points in the interval (a,b). 
We may assume that the points are ordered. 


(b) a < a, < ag < ... < ^ b. 

Since the points are Isolated, eventually for any mode of continued subdivision, each 
Interval Ij will contain not more than one point a^ in its interior or as right end point. 
If Ij contains a^, that la if Xj_.j ^ ^ Xj, denote it by Ij*., and call the eu?bltrary 

point Ej In this Interval, Then 


Hence 


AjP(x) - 


Pkif Ij - ii^. 

0 if Ij is not an IjJ.. 



Now as the norm 6 — ^ \ f and thus 

b n 

(c) [(t)(x)dP(x) 4>(aj^) Pj^. 

£l 

It will be noted that the continuity of (t)(x) at the points a^ I 3 essential . The result 
(c) may be shown to remain valid in the case where there is an infinite number of points 
of discontinuity of F(x) in (a,b). 



In the continuous case at points of continuity of the p. d. f . f(x) we have 
dP(x)/dx » f(x), dF(x) ■■ f(x)dx. 


and hence we might write heuristically 


6(x)dF(x) « f *(x)f(x)dx. 


The relation (d) may be proved as follows: We first assume that f(x) is continuous on 
(a,b). Then in each interval Ij we pick as the point for which 

A fix) - P'(5j)(Xj-Xj.i ). 

The existence of such a point is guaranteed by the mean value theorem • Then 

But by the so-called fundamental theorem of the calculus (actually, the definition of 
the ordinary definite integral), the limit of this sum as the norm approaches zero is the 
right member of (d). The proof can be extended to the case where f(x) has discontinui- 
ties on (a,b). 


We shall have need of the Stieltjes Integral over an infinite interval. We 


define it as 


•fUU u 

r (t)(x)dF(x) » 11m r(t)(x)dF(x) 

i a— » -00 i 


if and only if the limit exists as a — » -oo , and b — ^ +oo, independently . In more advanced 
work it la sometimes convenient to consider 

+T 

(f) lim \<t)(x)dF(x). 

T— ► +00 

This limit of course exists whenever (e) does, but the converse is false, (f) is called 


the Cauchy; 


value of the infinite Integral. Unless the contrary la explicitly 


stated, we shall always understand that the infinite Integral connotes (e). 

An intuitive explanation of the meaning of the Stieltjes integral will be 
given in 52. 53^ where we shall also indicate how the Stieltjes Integral may be general- 
ized over any range which is a Borel set E. In the univariate case, the various expres- 
sions for Pr(X€E) Introduced in § 2.11 may then all be summarized under 


Pr(X€E) - J dF(x). 
E 


so 
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g.^g Bivariate Oaae 

We limit our definition to the case where F(x^,Xg) Is a e. d. f. as defined In 
fg.lg. Let J he the half -open cell 

(a) J: a,< x, i b,, a^ < x^ b 2 . 

We assume ^(x^,X 2 ) Is continuous on J (boundaries Included). By means of lines parallel 
to the axes, subdivide J Into rectangles Jj, j » i,g,...,m. Let the norm i of the sub- 
division be the maximum of the lengths of the diagonals of Jy In each cell Jj pick a 
point Define ^P(x, jXg), the second difference of P(x,,X 2 ) over the j-th cell, 

as In $g.lg, and form the sum 




By considering the upper and lower sums Sy and Sj^, defined as In 52-51, we find again 
that 11m 3 exists, and define It to be the Stleltjes Integral of i with respect to P 
over J: 

(b) ^(l)(x,,X 2 )dP(x,,Xp) = 11m 3. 

The remarks In 52.51 regarding the Independence of (b) of the choice of 
the mode of subdivision remain valid. 

As In 52.51 It may be shown that In the discrete case 

5(b(x^,X2)dP(x^,X2) = X't>(l'ik,X2jj,) Pjj, 

J 

where o,re the points In J (excluding the left and lower boundaries) where the 

probabilities are Pi^ (see 52. ig). In the continuous case we may derive 

bg l^i 

5(b(x,,X2)dP(x^,X2) •» 5 S (t>(X,,Xg)f(x^,X2)dX,dX2. 

J ag a7 

In the mixed case defined In 52. lg, and In the notation employed there. It may be shown 


J*(x, ,X 2 )dP(x, ,Xg) - ^Ipgj^l (P(x^ ,X 2 j^)f(xJx 2 j^)dx^ , summed for all 1 such that ® 2 ^X 2 l^g’ 
Denote by Rg the entire x^Xg-space. We say that the Improper Integral 

I <l)(x, ,X 2 )dF(x, .Xg) 
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exists If and only If the limit 


11m 



^j4i(x,,X2)dF(x,,X2) 


exists, where J,a^,b^ are related by (a), as a^,a2,b^,b2 Independently become Infinite 
(with the signs Indicated). 

A generalization of the Stleltjos Integral to regions more general than rect- 
angles will be given In §2.53. 

2.53 k-Varlate Case 

We first define the Stlelt jes Integral over a half-open cell. 


(a) J: aj^ < XjL ^ b^, 1 - 1,2,...,k. 

We assxime that P(x^ ,X2,. ..,Xj^) la a k-varlate c. d. f. as defined §2.13, and that 
(l>(x^ ,X2, . . fXj^) is continuous In J (and on Its boundaries). By means of hyperplanes 
Xj^ = constant, 1 = l,2,...,k, we subdivide J Into cells Jj» j “ 1,2,...,m. Let 6 be the 
length of the longest of the diagonals of the cells Jj. Define AjP. the k-th difference 
of P over the cell J j as In §2.13, and f onn the sum 


where (5 , j, . . . ,6j^j) Is an arbitrary point In Jj. Uhder the hypotheses we have made, S 
converges to a 

division, as — » o. We define 


converges to a limit Independent of the choice of (5^ j,...,5j^j) and of the mode of sub- 


J 


,Xj^)dP(x^ 


,x^) - lira S. 
^ < 5-^0 


Let be the entire x^x^. . .Xj^-space. The Stlelt jea integral of 4> with respect to P 
over R^ la defined as in §2.52. 

Next, let us define the Integral over a region K which la the sum of a finite 
or enumerable number of half-open cell’s J^, 1 *» l,2 ,p.,. 


^ (bdP - y 5 

K ^ 

To define the Integral over any {B- measurable) region E In we cover E with a region of 
the type K just considered, .aid then take as the Integral o'^er E the greatest lower bound 
of the Integral over K for all possible K containing E: 
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5(bdP - g.l.b. ^ 4>dP. 

E K^l K 

In terms of our general definition of the Stielt jea integral we see that 

b 

S*U)dP(x) - 

a I 

only if I is the half - open interval a < x ^ b. For the closed interval we would have to 
add 6(a)[P(a)-P(a-o) ] - (i)(a)Pr(X*«a) to the left member; for the open Interval, subtract,^ 
<b{b)[P(b)-P(b-o)] - 4)(b)Pr(X-b). 

Specializing now to the discrete case, we may say that the moat general such 
case can be described as follows: There is a finite or enumerable number of points 
(x^ j,X 2 j, . . ), j - 1,2,..., and associated positive numbers Pj,5Ipj ■ 1, such that 

P(x^,...,x^) -^p^ summed over all 1 such that x^^ ^ x^,...,x^ ^ Xj^. 

In this case 

j6dP - X4>(x^g,...,Xj^g) Pg 3 unme d over all s such that (x^g, . . •,Xj^g)€E. 


In the continuous case 


J(bdP - JifdV, 

E E 

where dV la the volume element dx^dx^. . .dXj^. In the moat general case 

JdP - Pr(X€E). 

E 

It Is helpful for some of us to develop an intuitive feeling for the 
Stielt jes Integral. Consider first an ordinary integral 

^h(x.| , • • • )dY , 

E 

where h is continuous. We may conceive of the integral in a Lelbnltzlan (non-rigorous, 
but sometimes fruitful) sense: The k-dlraenslonal volume E is partitioned into tiny 
volume elements dV. These are so small that the function h la "practically constant* 
over any dV. We multiply this "practically constant" value of h by the volume dV and, 
sum over E. Now a c. d. f. #?tx^ , . . . ,Xj^) defines a probability distribution over I^, of 


tg.6 
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which It la aomstlmea convenient to think as a mass distribution. We think of dF as 
being the an»unt of mass or probability In an infinitesimal volvune element dV, whether 
It be concentrated at points, along curves or surfaces, or smeared out as a density. We 
wel^t the "practically constant" value of t In dV with the amoxait dP of mass or proba- 
bility, getting (bdP, and we svun over E. The reader may see that the definition of l6dF 

J 

over a half-open cell J Is a rigorous polishing up of the process we have described: 

In place of dV we use the cell Jj, In place of dP we useAjP, the probability that a rar 
dom point be In Jj, we multiply not by the "practically constant" value of 4) In Jj, but 
by any value it assumes In Jj, and finally. Instead of merely summing, we take the 11ml 
of the sum. 

g.6 Transfonnatlon of Variables 

Suppose y - )p(x) Is a (B-meas.) function of x. Then If X is a random varla’ 
with c. d. f. P(x), Y « (|)(X) la also a random variable with c. d. f. 0(y) calculated a 
follows : 


0 ( 7 ) - Pr(y^y) - Pr(i|i(X)iy) - I dP(x), 

Ey 

where Ey la the totality of points on the x-axls for which (|i(x) ^ y. 

More generally, suppose (X^ ,X2, . . .,Xj^) la a random vector variable with c.d, 
P(x,,X2,...,Xj^), and y, ,y2,...,yjj tre (B-meas.) functions of x, ,Xg, . . .,Xj^, 

Yl - (|ij^(x,,X2,...,Xj^). Then (Y, ,¥2, . . . ,Yj^), where Yj^ - <|i^(X, ,X2,...,Xjj), la a random 
vector variable with c. d. f . 

G(y,,y2,...,yn) - 5dF(x,,X2,...,Xj^), 

»y2» • • • 'Yn 

where ^ y region in defined by ^ y^, 1 - l, 2 ,..,,n. 

It may be shown that if are random (possibly vector) variables, and 

that If Yg « are (Ermeas.) functions, then If X.| and Xg are statis- 

tically Independent, so are the random variables Y^ euid Yg, 

Transformations of discrete variables offer no especial difficulties, so we 
consider In the following sections transformations In the continuous case. 

The theorems obtained there are essentially corollaries to corresponding 
theorems on the transformation of Integrals, single and multiple. Rigorous proofs of the 
theorems on Integrals may be found In standard real variable texts. For the student in 
this course the Insertion at this point of heuristic proofs which will strengthen hls^ 
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intuitive grasp seems desirable, and accordingly we employ the Infinitesimal arguments 
so useful In applied mathematics. 

2.61 Oil variate Case 

Suppose X Is a random variable with p. d. f. f(x). Let y ^(x) 
be a monotone transformation having unique inverse x « i~^(y), and such that 4>*(x) exists. 
Now consider a new random variable Y - 4>(X). The problem here Is to determine 
Pr(y<6(X)<y-Kly). Now since y - i(x) Is monotone. It Is clear that the values of x for 
which y < 6(x) < y ■t-dy(dy^o) will lie on an Interval (x.x-t-dx) depending on y, where dx 
may be positive or negative depending on whether 4>(x) Is monotone Increasing or decrea,s- 
Ing. Since x 4>~\y) by the Inverse of the transformation y - i(x), then expressed 
In terms of y, the Interval (x,x-tdx) becomes (4»~^(y), ♦”'(y+dy)). Hence the value of 
Pr(y«>(I^Ky+dy) is given by determining the value of Pr (x<X<x+dx) - Pr(4)"\y)<X«)“’(y+dy)) 
if dx > 0 or Pr(x+dx<X<x) - Pr(4)“ Vy+dyKX<(j)”^ (y) ) If dx is negative. In either case the 
probability la, except for differentials of order higher than dy, 

f(x)|dx| - f(x)|^|dy 

where x la to be expreaaed In terms of y. We may simimarlze as follows: 

Theorem (A) : Let X be a* continuous random variable with probability element 
f(x)dXs and let y ■ 6(x) be a monotone transformation with Inverse x » 6"^(y) such that 
6*(x) exists s Then except for differentials of order higher than dy 


Pr(y<6(X)<y+dy) « g(y)dy 
where g(y) - f(x)|^l expreaaed In terms of y . 

Example • Suppose 

f(x)dx - e^^dx (x^o) 

» 0 dx x<o 


and that It Is desired to find Pr(y<X <y+dy), 1. e., the probability element of 
y, say g(y)dy. We have the transformation y = x^, or x ■ and hence 


g(y)dy 

2.62 Bivariate Case 
Suppose 






(a) y^ « y^Cx^yXg), y^ - y2(x^,X2) 

are functions of x^, Xg with continuous first partial derivatives. Let fCx^^Xg) be the 
joint p. d. f . of and Xg. We shall assume further that the transformation (a) la 

one-to-one, that la, the relation between the x*3 and y*a Is such that corresponding to 
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each point In the plane (or that part of It for irtileh the probability fvmetlon 
fCx^fXg) ^ 0) there la one and only one point In the 7 ^ 7 ^ plane and each point In the 
y^yg plane which has a corresponding point In the x,Xg plane has one and only one corres- 
ponding point In the x^Xg plane, the relation between any point In the x^Xg plane and Its 
corresponding polilt In the y^yg plane being given by (a). Let the inverse of the trana- 
fonnatlon (a) be 

Xg - XgCy^yg). 

Let the Jacobian of the transformation (b) be 

(c) 

8(7, .72 ) 

If X^, Xg are random variables, then - y, (X, ,Xg) and Yg ■» y 2 (X^,Xg) will also be ran- 
dom variables. The problem here Is to determine the p. d. f. of and Yg, say g(yi#y 2 ), 
from f(x^,Xg) the p. d. f. of X^, Xg and the transformation (a). In other words, the 
problem Is to determine 

(d) Pr(7,<Yi<7i+<iy,,y2<Yg<y2+dy2) 
to within terms of order dy^dyg. 

Consider the infinitesimal region R In the x^ ,X 2 plane bounded by the curves 
whose equations are 


dx^ 

8x^ 

axi 


8Xg 

axg 

dYi 

dYp 



= y, (x,,Xg), 

Yg “ y2(x,,X2), 

7, + dy, 

= y,(x,,X2), 

y2*dy2 - y2(x,,x2). 


where dy,>0, dy 2 >o. 

The situation Is represented in Figure 2. 
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Now the probability (d) la given by H ^ ^ value theorem for 

R 

integrals the value of this integral is f(x{,xpdA n^ere (x{,x^) is some point in R and 
dA is the area of R. We must now find an expression for dA^ 

If the coordinates of in Figure 2 are (x^pX^) then the coordinates of 
Pg, Pj, P4 are 




ax. 


ax. 


ax. 


(f) 


^2= 377 *2+ 377 ^5' ^ **^2' *2+ 57; 

8x, dx, dx- dx. 

Pi,: (x,+ ^dy,+ ^dy2, X2+ay^dy,+ ^dy2) 


except for infinitesimals of order higher than dy^ and dy^. To show this it is suffic- 
ient to consider only one point, say Pg. The coordinates of Pg are given by (f ) when y^ 
is replaced by y^ + dy^ . We have 


*1 “ 3c,(y,+dy,,y2) 

9 Xi „ 

But x,(y,+dy,,^^) - x,(y^,y2) + + terms of order (dy, )“= and higher and X2(yi+dy, jyg) 

“ «iyi + terma of order (dy, )2 an<j higher. But (x,(y^,y2), X2(y, jyg)) 

are the coordinates of which have been indicated by (x^,Xg), thus showing that the 
approximate coordinates of Pg are those stated in (f). A similar argument holds for 
the approximate coordinates of P^ and 

It la clear that , together with the points represented by the approximate 
coordinates of Pg, P^, P^^ given by (f) form a parallelogram R*. Now it la known from 
coordinate geometry that If (x^,Xg), (x{ ,x^), (x^,x^) are three vertices of a parallelo- 
gram, then the area of the parallelogram is given by the absolute value of the deter- 
minant 


A 

1 

1 



Hence the area of the parallelogram R* 


la given by the absolute value of 
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(e) 


1 X, Xg 


9X, 

dx, dx- 

’ *2*iy7^i 


dy^ ay. 

m 

ax, axg 

9X- dXg 

; *1+ ly; *^2 *2* ajT dya 


ay^ ay^ 


aiy;^ ‘^yi'*y2- 


But since the coordinates of the vertices of parallelogram R' differ from the corres- 
ponding coordinates of the corresponding vertices of R by terms of order hl^er than 
dy^ or djg, it follows that the area of R ( 1 . e., dA) differs from the area of R' by 
terms of order hl^er than dy^ dy^. 

Since f(x^,X2), the p. d. f. of , Xg, Is continuous, we have that f(x|,x|) 
differs from f(x^,Xg) by terms of order dy^dyg,where (x,',xp is any point in R. There- 
fore we have the result that the probability expressed by (d) is equal to 


(h) 


f(x^ ,X 2 ) 


a(x, .Xg) 

acyi.yg) 


dyidyg. 


where the x *3 are to he expressed In terras of y’s by (b). 
reader that 


a(x^,x 2 ) 

aTyTTyJT 


aCyi-yg) 

a(x,,x 2 ) 


-1 


It may be verified by the 


We may summarize in the following: 

Theorem OB) : Let X.| , be two continuous random variables with p* d. f. 
fCx^yXg). y^ » y^(x^,X2), y^ = y2(x^,X2) be a transformation with a unique inverse 

x^ - x^Cy^yyg), Xg « X2(y^,y2), sucrf that the first partial derivatives of the y*s with 
respect to the x *8 exist . If the random variables y^CX^^X^) and y^CX^yX^) are denoted 
by and respectively s then 


Pr(y,<y,<y,+dy,; y2<Y2<y2+dy2) “ f(Xi>Xg) 


a(x, ,Xg) 

a(yi.y 2 ) 


dy, dyg , 


a(x,,Xg) 

where x, and Xg are expressed In terms of y^ by (b) . and ^ is given by ( c ) . 

Example ; To Illustrate the transformation problem for two random variables , 
suppose the probability element of and Xg Is 


f(x,,X2)dx,dX2 = ^ e 


. ix2- 2v2 
2*1 2*2 


dx^dXg 
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defined over the entire plane. To determine the p. d. f. of Y, and Yg irtiere 

Y, - Vxf+x|, Yg - tan"’ ^ . 


The transformation Involved here la 


y, -Vx^+Xg 

-1 *1 
yg - tan ’ — 

*: *2 

defined over that part of the y, ,yg plane for which Yi 2 ® ^ Yg ^ 

verae of the tranaformatlon la 

X, - y, coa yg 
Xg = y^ Bin yg. 

We have 


a(x,,X2) coa yg - y, aln yg 

■STxjTYgT “ aln yg y^ coa yg 


- Yt 


Therefore bY Theorem O, the probability element of Y, , Yg la 

1 ■ 

^ e Y,dy,dyg^ 

2.65 k-Varlate Caae 

Let the joint p. d. f. of , Xg,...,3^ be f (x^ jXg, . . . ,Xj^), and Introduce new 
random variables Y^, Ygj.-.jYj^ by meana of the one-to-one tranaformatlon 

(a) yj^ - yi^(x, ,X2,...,Xjj.), l=l, 2 ,...,k. 

Let the inverse (which will be unique ) of this transformation be 


(b) 

and ita Jacobian 


Xj^ = Xj^(y^ jygj . . . ,yj^ ) , 1 " lf2;...;k) 



dx^ 8X.J dx^ 

Tk) 

5y7 dyg ayj^ 


assuming, of course, that the first partial derivatives exist. 
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By pursuing an argument similar to that used In the bivariate ease> we find 
that the probability element of the say g(y^> dy^ ...d 7 ]^i Is given by 


(d) 


f(x, »x 


,Xjc) 


0 (x.| 9 ^ 2 * • • * ^ 


dy^^y2* • •^yic'^ 


where the x*a are to he expressed in terms of y*s by (b). 

This covers the cases where the number n of new variables equals the number k* 
of original variables. It can be shown that If n > k, there exists no p. d. f. for the 
n new variables. (Note here the complete generality of the treatment by means of the 
c. d. f. In 52.6). If n < k the usual method of getting the p. d. f. of the new variables 
Is to adjoin further variables to fill out the number of new variables to k, use the 
above procedure, and then "Integrate out" the extra variables by getting the marginal 
distribution of the n variables whose p. d. f . Is desired. 

2.7 Mean Value 

We begin with the definition of the mean value of a random variable In gen- 
eral and then consider In later sections the mean values of particular (reuidom) functions 
of especial Interest In statistics. If X Is a random variable with c. d. f. P(x) we 
define the mean value of X as 

+CD 

(a) E(X) - ^ xdP(x). 

-00 

This Is also called the expected value of X. 

If Y - (1)(X) Is a continuous function of X, then the c. d. f. of Y Is (52.6) 


G(y) - JdF(x), 

Ey 

where Ey is the set of points on the x-axls such that 4>(x) ^ y. Prom (a), 

+00 

E(Y) - Jy dG(y), 


-GO 


and this may be shown to be equivalent to 


+00 


(b) 


E[(|>(X)] - ^(i)(x)dP(x). 


-GO 


If random variables X. ,X^ X. Viox/a n h r wtir ^ ^ \ ^ 

have tne c. a. f. F(x^ ,X2, . . . ,Xj^), and 

y « (b(x^ . . .,Xj^) is continuous, then from the definition (a) It may be shown that 
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(c) E((|)(X,,X2,...,3^)] - 

where is the entire k-apace. Of course if the Improper Integral does not exist in 
the sense explained In §2.5> ff., we say that the mean value of i) does not exist. In 
the light of the Intuitive discussion (§2.53) of the meaning of a Stleltjes Integral, we 
see from (c) that the mean value of d) may be regarded as an average over k- space of the 
function 4), --the average being taken over volume elements dV, and the wel^t assigned to 
each contribution being the total probability In dV. 

For the discrete and continuous cases, the expressions (b) and (c) may be 
analyzed Into the forms given In §§2.51, 2.55- 

2.71 Univariate Case; Tchebycheff *3 Inequality 
The mean value of X^, 

+00 

- E(X^) = ^x^dP(x), i- 0,1,2,..., 

-00 


Is called the 1-th moment of the distribution F(x) about the origin . “ 1 for 
F(x); =* E(X) Is called the mean of X, also the mean of the distribution , and denoted 

by a. The 1-th moment about the mean Is defined to be 

+00 

(a) = E[(X-a)^] = 5(x-a)^dF(x), 1= 0,1,2,... 

-00 

For any F(x), = o. The variance of X, or the variance of the distribution ^ 

Is defined to be and Is denoted by the special symbol cr^. > 0 Is called the 
standard deviation of X or of the distribution. A formula for expressing In terms of 

obtained by using the binomial theorem In (a) and then Integrating 
In particular, we find that 



An important theorem about arbitrary distributions with finite variance Is 
contained In the Tchebycheff Inequality ; 


(b) 


Pr( lX-a|>(5<7^) ^ 1/(5^. 


To prove (b) we break up the Integral for 





+00 

a, - jx,dP,(x, ) - jx,clP(x,,X2) - | i ' q . 

-00 Rg 

Similar statements apply to a^ - ECXg). We note The point (a^^a^) may be 

called the mean of the distribution , the momenta about the mean for P{x^ ,x^) are de- 
fined by 

(a) i*ij - E((X,-a, )^(X 2 -a 2 )J] - J(x,-ap^(x^-a 2 )’^dP(x, jXg), l,j - 0,1,2,... 

Ra 

For any P(x, ,X 2 ), Hqq - i , jLiq “ " o* The variance of X^ has already been defined 

In 52.71; we note that It la cr^ - H. 2 o* Likewise, cr^ = ) 1 q 2 * The remaining second 
order moment la called the covariance of X^ and Xg. The quotient 

Pi2 “ t"ii 

la called the correlation coefficient of x, and Xg. By means of the Schwartz Inequality 
It may be shown that "i ^ p ,2 As an exercise the reader may show that If X, and Xg 

are statistically Independent, then p^g «• o, but the converse Is false. 

The reader may also verify that a necessary and sufficient condition for 
p^g - 1 is that all of the probability In the X^Xg plane be concentrated along some 
strai^t line with positive slope. (For p^g - -i the slope must be negative.) 

Formulas giving the moments about the mean In terms of the moments about the 
origin may again be obtained from (a); In particular. It la found that 


►*11 “ h'r®l®2' 

*^2 “ ^ 2 - 4 ' 


and these expressions may then be substituted in (b) to evaluate the correlation coef- 
ficient In terms of the first and second order moments about the origin. 


k-Varlate Case 


The moments of a distribution P(x^ ^Xg, . . . ,Xj^) about the origin are defined 


as 
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irtiere Is the coiiQ>lete k-space. For any F, ^ mean of defined in 

(2.71, may now he seen to he q, and can he expressed also hy means of Integrals 

with respect to marginal distributions of various orders. Ws denote K(X^ ) hy a^, and 
note that the above statements apply to a^ - E(Xg), . . . ,aj^ - E(;^). The point 
(a^,a2,...,a^) Is called the mean of distribution , and the moments |i about the mean 
are defined to be 


P-. .... 1 " TT ^ “ j" TT ^ ^1 “ ^ • ' 

Jijg Jic 1-1 ^ ^ o 1-1 ^ 


We note Pqq . , q “ ^ order to simplify the notation, we specialize the following re- 
marics to the variable or the pair X, jXg; their generalizations are obvious: 
p,oo...o “ variance of X,, defined In 52.71 Is seen to be P200...0' covar- 
iance of X^ and Xg, defined in 52.72, Is H1100...0' correlation coefficient of 

X^and Xg Is 

1 

P12 “ '^1100...o/^**200...0 ♦‘020...0^ * 


These quantities may all be expressed In tenns of the first and second order moments 
about the origin. 


Mean and Variance of a Linear Combination of Random Variables 


Suppose we have k random variables X^ ,Xg, . . .,1^, the c. d. f. of Xj^ being 
Fi(Xi). Let their Joint c. d. f. be P(x^ ,Xg, . . . ,Xj^), Fj^(Xj^) is then the marginal dis- 
tribution (52.2) of Xj^; if the Xj^ are mutually (statistically) Independent 

k 

F(x,,X2,...,X|j) = P^Pl(*l)* 


but we shall not assume this. Let y - i)i(x^,Xg,...,Xj^) be a linear function, 

k 

(a) 

1-1 ^ ^ 
k 

Then Y - i|i(X^ ,X2,. . .,)^) - la a random variable (52.6), its c. d. f. G(y) is 
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K K 

\ ij, ■ 


Mhere the are eonstanta and the are random variables with joint c. d. f . 

7(Zf,X2,...,Xj^). Tor exaiQ)le, the case of no overlapping would be obtained by requiring 
- 0, 1 » l,2,...,k. If E(X^) - aj^, then from (b) of {a.?*** 






Hence the oovarianoe of Y and is 

a p 


K K K 

where is the vcurlance of and j is the covariance Xj and Xj . Hence the cor- 


relation coefficient between Y and Y. la 

« p 






^^“l«J^lO'jPljl/^,PlPj<^c^Plj 


from (b) of 52.72 and (c) of i 2 .lk. Special cases of this formula for the correlation 
coefficient are much used in education and psychology In connection with tests. 

2.76 The Moment Problem 

The general moment problem (univariate) Is twofold: (j.) given an infinite 
sequence of numbers 1, does there exist a distribution with these nimibers as 

momenta? and if so, (Jd) is the distribution unique? It is usually only the problem 
(ii ) that arises in statistics. It may be shown that whenever the moment generating 
ftmctlon t(e) (see 52.8) exists for -h ^ e ^ h, h > 0, there is a unique* distribution 
with the momenta 4)^^^(o). 

Necessary and sufficient conditions for the unique determination of a 


then is analytic in a strip containing the imaginary axis, hence the characteristic 
function f(t) - 4»(lt) is analytic for all real t, and this is a sufficient condition for 
uniqueness in the moment problem: See P. L^vy, Theorie de 1 ' addition des variables 
sleatolres . Monographies des probabllites, Paris, 1937, p. ^1. 
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# ## 

distribution by Its moments are extremely complicated, but the following theorem 
gives €Ln easily applied stiff Iclent condition of Carleman: 

Theorem (A): A sufficient condition for the unloueness of a distribution 


with momenta Is that the series diverge, 

. m-1 

For a multivariate distribution with moments ji* define 

^1 “ ^^ioo,.,o >^oloo...o ■^^^o...ool • 

A sufficient condition of Cramer and Wold** for uniqueness Is Theorem (B), of which (A) 
may be regarded as a special case: 

Theorem (B) : If the series diverges . where la defined by 

(a), then the distribution P(x^ ,X 2 , . . . ,Xj^) Is uniquely determined by Its moments . 

2.6 Moment Generating Functions 

When the moment generating function (m. g. f . ) of a distribution satisfies a 
certain condition given below, then the noments of the distribution may easily be found 
by differentiation of the moment generating function. The use of the m. g. f . also per- 
mits the easy determination of the distribution of certain functions of certain random 
variables. We consider In detail the 
2,81 Univariate Case 

For any distribution F(x) we define the m. g. f. as 

(a) (t)(e) - E(e®*) = e®^clP(x). 

-00 

If we proceed heurlstlcally, we may write 

^ +00 +00 ^ +00 

(b) J e®^dF(x)]^Q = j e®^]e=o j x^dP(x) - . 


Let us now consider under what conditions “ d>^^^(o). 

In order that (t)(6), considered as a function of a real variable, possess de- 
rivatives at 6 « 0, It la necessary that 6(6) as defined by (a) exist in a nel^borhood 


♦H. Hamburger, "liber elne Erwelterung des Stlelt jesschen Momentenproblems", Math . 
Annalen . vol. 8i (1920), pp. 235 - 519 , and vol. 82 (1921), pp. 120-16U, 168-187. 

♦♦H. Cramer and H. Wold, "Some theorems on distribution fn^ictions". Jour . London Math . 
Soc., vol. 11 ( 1936 ), pp. 290 - 29 ^. 
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Sg , t 6l 

-h ^ 6 ^ h, h > 0 . (Note that In any case (t>( 0 ) « i la defined by (a)). We aee now that 

thia restricts the class of functions F(x) under consideration. Our definition (52.51) 

+00 +00 0 

of the Infinite Integral J Implies the existence of f and f . Hence as x -♦ + oo , 

-00 0 -00 

P(x) 1 sufficiently rapidly so that 

+00 

(c) J e^dP(x) < 00 , 

0 

and as X -+ -00 , F(x) -+ 0 sufficiently rapidly so that 

0 

(d) •'*2 “ J e"^dP(x) < CD . 

-CD 

This means that P(x) possesses moments of all orders: To demonstrate the finiteness of 

+00 

j x^dP(x), 

-00 


consider 


+00 


+00 


j x^dP(x) « Jx^dF(x) + ^ (x^e’^)e ^dF(x). 


Choose a so large that x e <1 for x )> a. Then the second term of the right member Is 
less than defined by (c); the first term is certainly finite, and thus 

' +00 

f x^dF(x) < CD . 


Similarly by use of (d) we may show 


x^(iP(x) 


-00 


< 00 , 


and hence < oo for all 1 . 

We now state the heuristlcally obtained relation (b) in the form of 
Theorem (A) : If the m. g. f . (t)(e) of a c. d. f . P(x), as defined by (a) , 
exists for -h ^ e ^ h, where h )> 0, then the i-th moment of P(x) about the origin is 



0,1, i 
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The proof of this theorem may he baaed on the theory of the bilateral Laplace 
transform and is beyond the level of this courset 

The m. g. f. If It exists Is uniquely determined by (a). The converse la 

stated in 

Theorem If F(x) haa the m. g. f. (l)(6), oM (t)(e) exists for -h i ^ ih, 

h > 0 , and If the c. d. f . G(x) haa the same m> g. then G(x) s P(x-). 

The reader may write out an expression for <|)( 6 ) in the discrete case, which is 
a sum of terms, and an expression in the continuous case,, which is an ordinary integral, 
by using the analysis of §2,51. 

We note that if Y = (li(X) is a continuous function of X, and Gr(y) is the 
c. d. f. of Y, then the m. g. f, of G(y) is 


+00 

^ W(x). 

-OD 

If this exists for l 6 | ^ h (h)> 0 ) and is recognized as the m. g. f. of a known distribution, 
then theorem (B) determines G(y), 

In certain problems, particularly in sampling theory, it is important to know 

the limiting form of a c. d. f. as n gd of a function X^ of n random variables. 

The m. g. f . offers a powerful method for determining the limit of this distribution. 

The method is to obtain the m. g. f . of X^^, say then if has a limiting form 

as n 00 which is the m. g. f . of some c. d. f . F{x), we may conclude under certain con- 

dltions that Llm F, v(x) = F(x). More precisely we shall state the following theorem 
n 00 ^ ^ 
without proof. 

Theorem (C) : Let F^^^j and be respectively the c. d. f. and m. g. f . 

of a random variable X^^(n=i ,2 , 3 1 ^ » « * ** . ) . If exists for le| < h for all n and if 

there exists a function ({>( 6 ) such that Lim ( 6 ) for |e| < h’, then Llm F/ x(x) 

= F(x), where F(x) Is the c. d. f . of a random variable X with m. g. f. 


*D. V. Widder, Ihe Laplace Transform . Princeton University Press, 19^1, p. 24 o. 

**If the integral defining 6(6) exists on the real interval (-h,h), it exists for complex 
e in the strip determined by the condition that the real part of 6 be in the Interval, 
and 6 is an analytic function in the strip; see Widder, loc. clt. Hence if for P(x) and 
G(x) the moment generating functions coincide in the interval, they coincide in the strip. 
For coincidence in the strip there is a uniqueness theorem: Widder, p. 2U5. 

4 n»*For proof, see J. H. Curtiss, "On the Theory of Moment Generating Functions", Annals 
91 Gtot . , Vol. 15, No. 4 , pp. 450-455. 


}a.8g 


II. DISTRIBITTION FUNCTIONS 


21 


(a) 


2.82 Multivariate Caae 

The m. g. f. of a distribution P(x, ,Xg, . . ,Xj^) la defined to be 


f Ogy • . * 9 ) = E(e 


ivi 

1-1 , 


J. 


1=1 


eiXi 


dF, 


We assume 


(b) 


i> exists for -h ^ ^ h, h > o, 


1 — 


and then may consider restrictions on P(x), analogous to those of §2.81, Implied by (b). 
We state without proof 

Theorem (A) ; Under the assumption -Lki 


j2 • • • Jk 


d 




aej^aeg^ 


1 ) 


«i = eg 


= \ 


Theorem (B) : If i> satisfies condition (b) , it uniquely determines F , 

Let Fj^(x^), with m, g. f. , be the c. d. f.’s of mutually Independent 

variables X^, 1 = i,2,...,k. Then the joint c. d. f. is 


(c) 


F(x.j • • • f 




TIfax 




I'^i' 


and the m. g. f. of F is 


<b ( 6 ^ , 6^ , • • • , 6k ) 


Je^"’ dP = ] e ^ ^dFj(Xj^), 

\ -CO 


(d) 


( e^ , 6g , • . , ( e ) « 


By the uniqueness Theorem (B) it follows that if the m. g. f. is (d), the distribution 
is (c). 

Theorem ( C ) : Suppose that random variables i = l,2,...,]c, have c . d . f . * s 
P^(x^) with m. g. f * and that all 6^(6^) satisfy condition (b) . Then the X^^ 

are mutually independent if and only if the m, g. f . (t) of the joint distribution F fac - 
tors according to (d) . 
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The theorem la also valid In the case where the are vector variables (then 
are also vectors ). 

If .Xg, . . ., 3 ^), 1 - l, 2 ,...,t, are continuous functions, then a 

method of determining the joint c. d. f. G(y^ ,72, . . . ,y^. ) of the variables Is to form 


the m. g. f. of G; It la 


(l)(e^,e 2 ,...,e^) = E[e 


i^iYi 

. 1-1 ^ 


_eiiU^(X,,X2,...,x^) 




dF x^ ) • 


If this exists for le^l ^ h > o, 1 = it uniquely determines G(y^ ,7^, . . . ) . 


2.9 Regression 


2.91 Regression P’unctlons 


If have the joint p. d. f. f(x.,,X 2 ), we define the regression function 

of X^ on X^ as the mean value of X^ for a fixed value x^ of X^^ 1. e. 


i-uu 

= E(X, IXg-Xg) = jx,f (x, |X 2 )dx, , 


where the conditional p. d. f. ((xjx^) Is defined by (g) of §2.4. We note that the 
regression function (a) Is a fimctlon of x^ only. The graph of this function is called 
the regression curve . If the regression function is linear, 

(h) ^^'^2 "" "" ^^2 ^ 

then we say that we have a case of linear regression , and call b and c the regression 
coefficients . The reader may show that if X^ and X^ are statistically independent, then 
the regression of X^ on X^ is linear, with b = 0 and c = , the mean of X^ . We remark 

that the regression of on X^ may be linear, while that of X^ on X^ is not. 

If X^ , X^ are discrete random variables, t^ien in the notation of §2.12, we 
define the regression of X^ on X^ only for X^ = x^^, i = i,2,..., by 


'l.x. = E(X,IX2=X2i) =Zp/ij/ZPj» 


where both summations are made for all j such that x^j = x^^. For the mixed case des- 
cribed in §2.12, we define the regression of X., on X^ by 
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In 


(d) 


+00 

= E(X, IXg-Xgj^) = ^x,f(x, |x2j_)dx, . 

2 -OD 


We shall limit the discussion for more than two variables to the continuous 
case. For k random variables ,X ^, . . . let f {x^ |x^,x^, . . . ,Xj^) be the conditional 
p. d. f . -defined by (h) of Then we define the regression function of on 

to be 


-fOO 


(e) 


l-x^x^.-.x^ 


= F( X^ i f 1®=2 ^ 3 > • • • jk) ““ ^ x^f(x^ ^Xj^ )dx ^ 


-CD 


If this function of Xg,x^, . . . ,Xj^ Is linear. 


k 



+ c. 


then the regression Is said to be linear and the b- and c are called regression coeff lc ~ 

j 

lents . Similarly, we may define the regression function of any X on the remaining X*3. 
We note In conclusion that a regression function may always be regarded as the first 
moment of a conditional distribution. 

g . 92 Variance about Regression Functions 

The variance of X^ for a fixed value x^ of X^ is defined as 


(a) 


1* X, 


QO 

= J<> 

-00 






Ixpjdx, 


0 p 

cr^ Is, In general, a function of x , and Its mean value r. with respect to x^ Is 

1 • x^ c ^ 

known as the variance of X^ about the regression function of X^ on X^. That Is, we have 


(h) 


1 • 2 


CD 

-00 


f2(X2)dX2 = 


OD CD 

II 




In the k- variate case, we have 


(c) 


•XpXj... 


03 

-OD ^ ^ 


and the variance of X^ about the regression function of X^ on 
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00 00 




-00 -00 


OO CD 

- j...j(*1-®l-X2X ...Xj^ )2f(x,,X2,...,Xj,)dXidX2...dX^, 


The quantities given by (a), (b), (c) and (d) may be similarly defined for discrete and 
mixed cases, and also for empirical distributions. 

2.95 Partial Correlation 

Suppose , X^,...,^ Is a set of random variables. The covariance between 
any two of the variables, say X^ and X^ for fixed values of any set of the remaining var- 
iables, say Xj,, Xj,^i , . . . ,X^ (2<r^), Is defined as 

OD 00 


CD CD 

‘^i2T(r+i )...k “ j. . . 2-Xj, Xp^^ ...Xj^^^Vr+1 "•*k^‘^r*'‘‘**k 

-00 -00 

00 CD 


-00 -CD 


'The partial correlation coefficient Pi2-r(r+i ) k \ with respect to 

^r+1 ^ ’ ‘ defined as 


P 12* r(r+l ). • .k 


^I2*r(r+D. • 

n -rCr+i ). • .k ^2 r(r+i ). . .k 


The quantities defined In (a), (h) and (c) extend to discrete and mixed cases. 

2.9^ Multiple Correlation 

A procedure which Is often carried out in statistics Is that of determining 
heat-flttlng linear regression functions In the sense of least sqiaares even though the 
actual regression function is **not quite** linear. The procedure Is perhaps more often 
carried out with an empirical c. d. f. Pj^(x^ ,X 2 , . .,x^) than with a probability c. d. f. 
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Here, we shall only consider the case of a probability c. d. f • where the variables are 
all continuous. There will be analagous results for discrete and mixed cases (and also 
for the empirical distributions). 

In this problem we let be random variables with c. d. f. 

P(x^ ,X 2 , . .,Xj^) and determine the constants b^, so that the mean value cf the square 

(x--b--^b^x^ la a minimum, 1. e., so that 

1 ' i«2 ^ ^ 

? ^ k k 

(a) S - j.. f (x,-b,-Tbj^x^)^dP(x,,X 2 ,..,Xj^) - E[(X,-b,-^b^X^)®] 

-CD -C» 


Is a minimum. 



by -2): 


The values of the b*s which minimize S are given by solving the equations 
(1=1 ,2, . . . ,1^. Writing out these equations we have (after dividing each equation 


(b) c^g-b^ag-bgC^g-b^c^j*-. . = 0, 


where a^ = E(X^) and c^j * E(X^Xj). Substituting the value of b^ from the first equation 
Into each of the remaining equations, aind setting = c^j-a^a^ - E[ (X^-a^ )(Xj-aj ) ], 
the covariance between X^ and Xy we have the following equations to solve for 



k 

^biCij = (j =^2,5,...,k) 


from which we obtain by using Cramer's rule for solving linear equations 



IC^jl being the determinant 


It Is assumed, of course, that this determinant ^ 0. 



• • ‘^2k 

'3?^55 


'k2^k5 ' 

• • ^k 
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For the value of b, we therefore have 


b, - a. 


ib.a. - a, - a.C, . 

2 1 1 i73"2 


The least aouarea regression function of X.) on Xg^X^, ..,X^ la thus 


(f) b, 

where the values of the b'a are given by (d) and (e). 

If we substitute the minimizing values of the b's, given by (d) and (e). In 
(a) we obtain the minimum value of 3: 

Mln(S) = E[(X,-a, = C,^ - 




l',j'“2 


If we sum the last expression first with respect to 1, we find that ^ C.. = 1 If 

1=2 

1' - j , and = 0 If 1' ^ j. Hence axmimlng on 1 and putting 1' = j the last expression 

reduces to > C, .C, which Is the same as !> C,.C, Thus denoting Mln(3) 

_2 j.j“2 'J J l,j =2 ’J 

-2 3...k 




^11 ^12 


^21 ^22 • * • *^ 2 k 


*^2 •' ' ' ^ 


To Show that o'i. 23 .,.k expressed as this ratio of determinants, let us note that 

the determinant In the numerator may be expressed as 


^11^11 *^12^12 ^ik^ik* 


( 1 ) 
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where 77^ - oofactor of in the numerator determinant. Now, for 1 - 2 , 3 > ••«]£, 

(j) «• - 

where Is the cofactor of In the determinant (i»J~ 2 , 3 ,><,k). Hmce the 

numerator determinant may he expressed as 


^1^11 ■ 


\0^i\ * 1 ( 1 , j* 2 , 3 y . . »lc). Dividing expression (k) and remembering that 

a C^*^(.l,j a 2 , 3 ,...,k), we therefore establish the fact that ^^, 23 ., k expressed 

as the ratio of determinants given In (h). The quantity ^ ^ Is the veirlance of 
about the least- square linear regression function (f ), and should not be confused with 
^?- 23 ...k defined In §2.95* 

The correlation coefficient between and the regression function (f ) Is 
known as the multiple correlation coefficient between X^ and XgyX^^a.^X^ and Is denoted 
by Rt. 23 ,,,ic* obtain an expression for the multiple correlation coefficient, we 
first determine the covariance between X^ and the function (f ), which Is 




The variance of la C,, and that of (f) la 


E[( ^ (Xj-aj^)C,,C^'5)2]. 

1 , J“2 


whose value Is equal to the last expression In (g), and which has been reduced to 

k 4 4 


C, 4 C, .C Hence the multiple correlation 

l.j“2 ’J 


coefficient la 


1 -23. . .k 




lJ-2 




c,ic,jc‘3 


It will be observed from (h) that 


‘^?- 23 ...k ” •23...1c^' 
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and hence by §2.72, R®.23...ic - 1 If > and only if, all of the probability In the 
k-dlmenfllonal space of the random variables lies on the least-square regression surface 

It should be noted that a partial correlation coefficient between and Xg 

I 

with respect to X^., , . . . , ^ could be determined for the case of a linear least- 

square regression function by replacing a, and a„ ^ 'bj the corres- 

r*r+i** k 

ponding linear least-square regression fmctlons In determining 
®'l-r(r+l )...k’° 2 -r(r+l )...k ^ § 2 . 95 . 

Again, we remark that analogous results can be obtained by using an empirical 
c. d. f. P^(x, ,Xg,...,Xj^) Instead of a probability c. d. f. P(x, jX^, . . . ,Xj^). 
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SOME SPECIAL DISTRIBITriONS 

In the present chapter, the notions of the preceding chapter will be exemp- 
lified by considering certain distributions that arise frequently in applied statistics. 
We shall begin by considering distributions for the discrete case. Since the distinction 
between the random variable X and the corresponding Independent variable x of the dis- 
tribution function has been made clear, we shall henceforth denote both by the lower 
case X unless this leads to ambiguity. 

5*1 Discrete Distributions 

3.11 Binomial Distribution 

An Important distribution function of a discrete variate is the binomial dis - 
tribution which may be derived in the following manner. Suppose the probability of a 
"success** in a trial is p and the probability of a "failure" is q = i - p. For e* ample 
the probability of a head in a toss of an "ideal" coin is ~ and the probability of not a 
head (a tail) is 1 - We can represent these probabilities in functional form 

f(ot) where f(a) - p for a = i, a success, and f(a) =■ q for o, a failure. In other 
words f(a) is the probability of obtaining a successes in a single trial. 

The probability associated with n trials which are mutually independent in 
the probability sense is 

f(o.i) . fCa^) . 

The probability of x successes and n - x failures in a specified order say 
“a “ “x “ “ O"--" <^n = 

f(l)^f(or^ = pV^ 

The number of orders in which x successes and n - x failures can occur is the 
number of combinations of n objects taken x at a time which is 

n - ni 
n^x X 1 ( n-x ) .* ’ 


(a) 





.11 


These orders are mutxially exclusive events. Hence, to find the probability B(x), 
say, of exactly x successes Irrespective of order we add the probabilities for all of the 


orders, thus obtaining 


(b) B(x) = 

B(x) will be recognized as the (x+l )-st term In the expansion of (q+p)'^. This demon- 
atrates that the sum of the probabilities is equal to unity, 1. e. 


^B(x) = =- (q+p)'' = 


1 . 


Hence X B(x' ) is clearly a c. d. f. P(x). 

x^ 

To derive the momenta of the distribution B(x) we will find It convenient to 
use the m. g. f. 


(t)(e) - E(e^®) 


n 

n’ 


C^e p q 


t 




The h-th moment of x can be expressed as 


^ de^ 


In particular the mean E(x) is 


Pi - ^ (q*pe*)" 


6=0 


= npe®(q+pe®)^”^ 


le = 0 


and the second moment about zero la 


^ (q+pe®)^ 

ae 


npe®(q+pe®)^”’ 


+ n(n-l )p^e^®(q+pe®)^"^ 


= np + n(n-l )p . 


Therefore, the variance Is 


P P P P P 

o- = np + n(n”l )p - n p = np - np = npq. 
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Exajmple : Applying the binomial distribution to the coin tossing problem, 
we have p “ and q « The probability of x heads Is 


B(x) - “ nCx(l)" 


The mean and variance are, respectively, 

u.' ^ ^ 

In deducing B(x) we have assumed that p remains constant from trial to trial. 
If the probability Is different for each trial, our conclusions must be modified. Let Pj^ 
be the probability of a success In the 1-th trial (1 « i,2,...,n) and - i - Pj^ the 
corresponding probability of a failure. Let 


P = H - p- 

Then the expected value of x = . the total number of successes In n trials, is 

1«=1 ^ 

E(a^ + ...-K]^^) « E(at^ ) + ...+E(aj^) - p^+...+Pj^ - np. 

The variance of is Since the trials are Independent the variance of x - -^^1 


13 ^^PiQr 


Noting that p^ « p + (p^-p) and - q - (Pj^-p) we can write the variance 


^ “ n 

(f) ^ [p+(Pl-p)][q“(Pi-p)] = ^ [pq-(Pi'P)(P-q)-(Pi-P)^] - npq - VCp^-p)^. 


This is obviously less than the variance, npq, we found above. When the probability la 
constant from trial to trial, the distribution Is known as the Bernoulli case ; when the 
probability varies, we have the Poisson case . 

In §2.71 It was proved that if a variate x Is distributed about the mean a 
with the variance o-^, we have the Tchebycheff Inequality 

Pr( lx-a|><5cr) ^ 


for any <5>0. In the binomial distribution x has mean np and variance npq. Let us change 
to the variate r “ the ’’relative frequency” of successes. We have E(r) - E(^) « “E(x) 
- ^ = p. Similarly, The Tchebycheff inequality states that 




f{y(i )).f(y(g))...f(y(n))» 

where each of the y's will have one of the values ,y^ , where fij^) - pj^(i-l,2. 

We now wish to find the probability that x, of the y's are y^ 's, Xg of the y's 

are y.'s, etc., (^x. =« n). 

1=1 ^ 

The probability of x^ events characterized by y, , etc., occurring In a speci- 
fied order, aay y^^ ^ = y,,...., y^^^ ^ = y^, ^ = yg,..., y^^) = yk» 


X, Xp 


„ *1 ^2 . \ 


f(yi) fCyg) ...f(yk) = p, 'Pg ^-.p^ 


The number of different orders In which we can get x, y, 's, etc.. Is the number of ways 
In which n objects can be permuted where x, are of type C,,...,X)^ are of type Cj^., that Is 


X^ .'Xg.' . . .Xj^.' 


So the probability of x^ y^ 'a, x^ y^'s, etc.. Irrespective of the order In 
which they occur la given by adding the probabilities of various possible orders. We 


obtain 


M(x.| • • • f Xj^ ) — 


n* \ 

Pi P 2 --Pk 


This may be recognized as the general term In the expansion of {p,+P2+. . .+pjj)’^. Hence, 
the sum of M(x^ ,X2, . . ,Xj^) over all partitions of n, that Is, all sets of Xj^(^j!^"n,Xj^ 0 ) 
Is unity. 


To find the means, variances, covariances, and hl^er momenta we set up the 


m. g. f. 


i ® l^l 

(t) ( 6 .| , 6 ^ > • • • y ) = E .6 


> nJ /Vl_ ^i_ ^2 _ \ 

-^ 1.^2 ^k- ' 

n ' ®1 ^1 ®2 *2 \ \ 

^ 5rf3ri x-r (Pi® ) (P2® ) ---(Pk® ) 

LX 4 =n ^1 *^ 2 ' • • ^ 


(p,e +...+Pj^e ) 


JSS. 
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The mean of la 


(c) 

And 


“ fe I " "^Pl® ^^Pi® ’+.*-+Pke^)^'’ 

1 At.. A * 


e'e-0 


e'i:-o 


nPi- 




^ 1 O 26^ 6- ^ Ol 

np^e ^(p^e +.,.+Pj^e n(n-i)p^e ^(p^e +..,+Pj^e ^ 


6»s-0 


6*a»o 


= npj^ + n(n-l )p^. 

Therefore, the variance of la 

(d) - npj^ + n(n-l)p^ - n^p| •» np^(l-pj^). 

In a almllar manner we find the covariance between Xj^ and Xj to be - np^^pj . It la clear 
that the binomial dlatrlbutlon la the apeclal caae when k »« 2. 

3.13 The Polaaon Dlatrlbutlon 

The Polaaon dlatrlbutlon la In a sense a particular limiting form of the bi- 
nomial dlatrlbutlon. We ahall deduce It from geometrical considerations. Let AB be a 
line segment of length L and CD a segment of length 1 contained In AB. 


AC D B 

Figure 3 

Let the probability that a point taken at random falls on an Interval of length du be 
^ ; that la, the p. d. f, of u la a constant. The probability of the point falling in 
CD la If wo lot n points fall at random on AB, the probability that exactly x of them 

fall on CD la given by the binomial distribution {(a) of § 5 . 11 ) 


B(x) 


nl 

x] (n-x) J 




1 vn-x 
L^ 


Now let n and L Increase Indefinitely in auch a way that the average number of 
points per unit length la a finite nimiber k^ 0 , i. e., £— »k. Now 

B(x) ■ .(n-x+1 ) ^nl)X^^_ n l^n-x 

[xJn^] 


So the limiting value of B(x) for a given x Is 




This argument given for one dlmenaion immediately extends to two or more di- 
mensions, For example, for two dimensions we would take AB and CD to be regions of the 
plane, the latter contained in the former, and k to he the limiting ratio of the number 
of points per unit area. The Poisson aistrlbution is applicable to problems dealing with 
occurence of events in a time interval of a given length such as emission of rays from 
radioactive substances, certain traffic problems, demands for telephone service and 
bacterlc count in cells. 

Example ; Let us consider the following problem as an example to which the 
Poisson distribution is applicable. If X-rays are considered as discrete quanta and 
if the absorption of k or more will kill a certain unicellular organism, what is the 
probability that an organism of a given size S on a given glass slide will cscane 
death by X-r^-ys nfter being exposed for t seconds? On tiie assumption thot the 
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projection of the organism of size S on a plane has an area of a, and m is the 
average number of rays striking an area of size a in t seconds, and the rays 
appear independently and at random, then the probability that x of the X-rays 
hit. the organism in t seconds is 


p(x) 


e'V 
~in~ * 


k-1 

Hence, the probability of survival is ^ p(x). The average number of rays ab- 

0 

sorbed by the survivors Is 



3,1^ The Negative Binomial Distribution 

Another discrete distribution which Is closely related to the Bernoulli bi- 
nomial distribution Is the nep^atlve binomial . If we expand, according to the binomial 

theorem, 

(q - P)'*^ 

where q«=l + p,k>0, p/O, we get as the general term 



When we interpret this as a probability function of x, p(x), it is called the negative 
binomial distribution and is defined for x => 0, i, 2,... We notice that the sum of p(x) 
for all X Is unity, 


00 00 


X=0 X=0 

The m. g. f. Is 


oo 


(b) 


4)(e) 


00 


x=o 


- Ti"' SSjfj ■ a-pe”) 


6x-k 


x=c 


From this we fir'd the mean 


Ic) 


E(x) = V 


dd- 


de 


= Vpo^(q“pe^) ^ 


e«o 


6«0 


kp 
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E(x^) 


2. alt 


kpe^Cq-pe^)*^*^^ + k(k+l - kp + k(k+l )p^ 


Therefore the variance la 


cr^ - kp + k(k+l )p^ - k^p^ - kp + kp^ « kpq. 


The similarity of this m. g, f . and these moments to those of the positive binomial dis- 
tribution should be noted. 

It can easily be shown that a special limiting case of the negative binomial 
distribution Is the Poisson law. If we let p — ♦ o and k — ^ coin such a way that 

11m kp “ m , 


"*pr‘' ^ (T^p)” 


11m (l-ff )‘^ 1 .(k:+x)(k+x-i ), 
k-^oo 


.(k+D 






If we make a change of parameters, we have the usual expression for the 
Polya-Eggenburger distribution. Let 


k = ^ 
^ cf^ 


p = d. 


Then the distribution may be written as 


(e) p(x) = (1+d) 

This distribution, one of a number of contagious distributions . la useful in describing, 
for example, the probability of x cases of a given epidemic in a given locality. 

If we Interpret ^ as the probability of a ’’success" and ^ as the probability of 

a "failure" in a trial, then it will be seen that (a) is the probability that x + k trials 

will be required to obtain k successes. For the probability of obtaining k - i successes 
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defined over the range -oo < x < oo where k, h, and c are conatante,* Varloua attempta 
have been jnade to establish this distribution from postulates and other primitive assump- 
tions, Gauss, for example, deduced It from the postulate of the arithmetic mean which 
states, roughly, that for a set of equally valid observations of a quantity the arith- • 
metlc mean is the most probable value. Pearson derived it as a solution of a certain 
differential equation. It can be shown that It Is the limiting distribution of the 
Bernoulli binomial distribution. We shall not derive the normal distribution from more 
basic considerations, but we shall observe that it arises under rather broad conditions 
as a limiting distribution in many situations involving a large number of variates. 

We can determine k in the distribution by requiring that the Integral over the 
entire range be unity. If we let u « h(x-c), we wish 


JdP(x) - ^ \ e"^ du « 1 . 

f 

To evaluate the Integral I =* j e ^ du we observe that 

-00 

-OD -00 

Changing to polar coordinates u « r cos 6, v - r sin e, we get 

2rr 00 2 

I^ « ^ dr d© - I 

0 0 0 


Therefore, we take k 


lE 
yR • 
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The mean of the distribution Is 


E(x) ^ xe'^ dx 

-00 

\ dx ^ (x-c)e~^ (x-c) 


The latter integral la zero because the Integrand la an odd function of x - c. So 


a = E(x) = c||^ ^ e‘^ (^“C) c 


The variance la found by Integration by parts. 


cr^ = e((x-c)^ = ^ ^ (x-c)^e~^ (X“C) (3x a -1- . 


We usually write the normal distribution with c and h expressed In terms of a and 


respectively, 1. e., 


L.. 20^ 


f(x) - ^ e 
ystTcr 


We shall refer to this distribution as N(a,cr 


To find higher moments (about the mean) It la convenient to use the ra. g. f . 


of the normalized variate 


Y-fl 00 ^x~a ( x-a) 

(t)(e) = = — - 5 e e 2a^ dx 




00 1 fX-a _ 012,1,2 

^ t , *5* , 


\/2lta-_. 


Setting ■ 6 “ y> the last Integral becomes 


i/sj 


00 1„2 ^ 1.S 1.2 

r "oY + 

\ e ^ ^ dy - e^ 


Hence, 


^)(e) - e 
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It should be noticed that the normal distribution is symmetrical with respect to the 
line X - a, its mean. The smaller the value of or^ is, the greater the concentration 
about the mean. In fact o- is the distance from the mean to the points of inflection: 


f(x) 



Because of its wide application and because of its theoretical Importance, the normal 
distribution has been the origin of much of the terminology and many of the concepts in 
statistics. 


The Integral 


00 




F(x) 


is widely tabulated; the ordinate 



is also tabulated in many places. 


The value of x 

+X -u^ 



^ du 


for which 


2 


is called the probable error and is approximately .674*5. 

It can be readily verified by applying Theorem (C) §4,21, that as n oo the 

normalized variable j=^ , where x is distributed according to the binomial law, has the 
Ifnpq 

limiting distribution N(o,i ). For we may write 

»-np 

fpQ 

where ,x ^, . . are independently distributed according to the law p(x) » p^(l-p)^"^, 
(x«o, or 1 ). The mean of this distribution is E(x) « ^~xp^( i » p, and the varl- 

2 ■ 1 P X 1 -X 

ance is o- « 21 P ( ^‘P) “ PP* ^h® applicability of Theorem (C), §4.21, is then 

x^ 


obvious . 



3.22 The Normal Bivariate Distribution 

The extension of the normal probability density function to the case of two 
variables, x^ and x^. Is straight foiward. We replace (x-a)^ by a quadratic form In 
x^ - a^ and x^ - The distribution may be written 

- i Q 

Ke ^ 

Where Q - ^ 22 ^ 2 ^ ^1 “ ^1 ' K, > 0, A^^ > 0, A ^2 are con- 

stants such that A^^Agg > A^^. These inequalities on the A’s are necessary and sufficient 
conditions for Q to he a positive definite quadratic form in y^ and y^, i. e., Q > 0 
unless y^ - y^ = 0 . We wish to determine K so that the Integral of the p. d. f. over the 
x.jX 2 -pl€uie is unity. The Integral treinsforms to 


-00 -00 

A A^ 

00 00 1 . /„ 2. ^^12 ^ ^ 12 1 /a 12 ^ „2 

ir C f „ ^1^2"^ .2 ^2^ 2^^22” . ^ ^2 


dyidyg 


If we let y^ + y^ = z^, and Integrate z, and y^ In (a) from -oo to +oo, and use the 


fact that 


e dx -y^' 


(c>o). 


we obtain for (a) 


lAi ^Ag^-A^ 2 


If the integral is to be unity^ we must choose 


Wa, ,A-A 


22^^12 ^ 


where A is the determinant 


All ^12 

Ai2 A^2 


We may, therefore, write the distribution as 





.here - Ajj. 


In order to find the means, variances, and covariance of and x^, it will 


be convenient to obtain the m. g. f. of (x^-a^) andlx^-a^), i. e. 


« E(e^ 1111) 


-in 


cDoo -iQ+^ei(Xi-a^) 


dx^dXg. 


Letting Xj^ -a^ - we have 


' Vi 


> I/A? T ■ Aj w-?- Vi 

i® ^yi'^yg 


1 P P 1 ^12 ^12^1~^11^2 2 1 

^9 Y - 5A,nyi+jr;-;-y2- s;, ) ' 2 (^ 22 ' S77)(y2+ s '> 

2ff L L ® 


dyidyj 


„here R - = a’ ’ e^A^2^|.2A’ where A^J 


cofactor of A^^ • in A 


Making the change of varlahlea 


„ ®i 


yi ?r: y2 ■ jt = n’ y2 


A, 361 -A,, 62 


and Integrating with respect to and Zg, we obtain 


(b(e^,e2) = e 


is a‘V^ 


Now consider the problem of finding the mean values of x^ and Xg. We have 


) " le. 


(e^A”+egA’^)(b(e,,e2) 


I e^-Sg-o 


I 6.j*62*0 


Hence E(x^ ) « a^ • Similarly ECx^) « a^, 



To find the variances and covariances of and x^, we must take second derivatives, 


Thus to find the variance of we have 


- E[(x,-a, )®] = ^ 

’ ’ ^ ae‘ 

e^-e^-o 




Similarly, 


For the covariance, we have 


^2 .22 
cr^ - A . 


pa^^2 - E[(x^-a,)(x2-a2)] = 


16 ^- 62=0 


If the three equations 


[A’^+(e,A”+e2A’^)(e2A^^+©,A^^)]it)(e,,t.2) |e,-e2-o ■ 


a-? - a” 


2 ,22 

GTg = A 


ri^Tgp - A 


are solved for A^ ^ A^^, we obtain 


k =, I A — ' A _ K 

'1 FT ' 22 2-, „2x ’ “12 “ ,, 2v 

) cr„{1-p ) er,02(1-p ) 


We may summarize as follows : 

Theorem (A) ; If are distributed according to the bivariate normal dis - 


tribution 




the m. g. f . of (x^-a, ) and(x 2 -a 2 ) la given (e); E(Xj) = a^, (1-1,2); Uie variance 

11 1 P 

of x^ Is A (1«1 ,2) and the covariance between x,, and A • A^ ^ , A^^^ A^^ are 

expressed In terms of variances and the correlation coefficient between x^ and x^ ^ (£), 

Expressing A^ ^ , A^^^ ^22 ^ terms of cr^ o*^ and p, the dlstrlbiitlon 


(h) may be written as 






1 „ (x^-a, )(x2>a2) ^ 

^ ^ 2p. ] 

e 2(1-P ) < ®2 <r,o% 


The marginal distribution of (1) with respect to x^ Is the distribution of 
X,. Thus Integrating (1) with respect to Xg we obtain as the distribution of x, 

A similar expression holds for the distribution of Xg. 

'Ve would also like to know the conditional probability fmctlon 


f(X2lx, ) - 


f(X^,Xg) 


Substituting the expressions for f(x^g) and f^(x^) from (a) and (b), respectively, we 


f(x-|x, ) - 


^o-g 


,tx2-VP^<x,-a,)]® 

2 


Thus, for a fixed value of x^ , Xg Is distributed according to N(ag-*f 5 !l(x^-a, ), <rg(1-y^)). 

In a similar way we can show that the marginal distribution of Xg Is 
N(ag,o|) and the conditional probability of x^, given Xgjls N(a, + pp-(Xg-ag),<r^(l-p^)). 

It will be observed that If p - o, the marginal and the conditional probability distribu- 
tions of X, (or Xg) are Identical. 

Since the ocndltlonal distribution of Xg Is N(ag+p^(x,-a, ), <rg(i-p^)), the mean 
value of Xg for the Interval (x,,x^+dx^) Is slnqjly ag-tp^(x,-a^ ). So the regression 
function of Xg on x^ Is linear, that Is, 


Similarly 


'2.x, = ag + p 5 ^ (x,-a,). 


^Kx « + p — (Xg-ag). 


Since (Tg (1-p^) la the variance of Xg about the meanag^^ in the conditional probability 
dlatributlon, the nearer p^ I 3 to i , the amaller I 3 this variance. If p « 0, Xg doee not 
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depend on ; the two variates are Independent and 

_ _ (3tg-ag)^ 

1 1 

“ iss ;' ‘ 15 ^’ 


i.23 The Normal Multivariate Dlatrlbution 


Let us now consider the extension of $3.22 to the ease of k variates. 


f (x^ .Xgi . . . fX^ ) “ Co 




where 1 1 1 1 is a STmnetrlc, positive definite matrix . tl»t Is. A^j ■ Ajj^ and 

^ A. jt.tj > 0 for real t., not'all zero. 
l,j«l J 1 

Wo wish to determine C so that the Integral over the entire range, -a)<Xj^<oo, 


Is unity. We must have 


00 00 1 

i_ f f -"i 


-I ...s 




dx^ • • ^dbCji^ • 


-00 -00 


To evaluate this Integral, we transform the varlablea. Let 


*1 ■ ®i “ ^1* 


where Q 






'OD 00 _i Q 

I ...J ® ^ <iy,dy2...dyic' 

-00 -00 


Ifow we can write 


j 2 A- .A- . 


- 7 , * 






l,j ■■ 2,*«,k«. 


Let 
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00 00 


i- I ...S 






-00 -00 


The range of la -oo < < oo • 

We should observe that the quadratic fora Is again positive definite, that Is, 


An®? ^ 1^/ij Vj > 0 


for real s^ not all zero. For If there were such a set of s's for which this quadratic 
fora were zero or negative. It would be Implied that there Is a set of t*s for which 




tj ^ 0. 

We continue this process. In turn letting 


z, - y« + 


Ay^i 




and correspondingly 


(2) _ .(i)_ , 


'IJ -"Ij 


1> j — 5, • • • ,k , 




Each quadratic form In this sequence Is positive definite by the foregoing argument. The 
Integral becomes 


qp QD 

I 

>11-^ • • • 

-00 -00 


® -1a 2^- Ia^U-S- 

f g 2 ^ 11^1 2^22 ^2 •' 


- iA(k-1)„2 
• 2Tck ^k 


dz^ . . .dZj^. 


The final quadratic form Is posltlv*e definite, so >0, > o, . 

we can Integrate on each z In turn, using the fact that 


. .,A,^v'^ ^ > 0. Hence 


rcx^ . S 
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Therefore, we get 




To find the value of let us evaluate by Lagranges* method (known also as 
pivotal condensation) the determinant of II II 


|A| ■■ ^11 ^12 •••• "" ^11 ^ ^12 

Ag, Agg .... Agjj ^ Agg 


\l \ 


2 


Ai - Aj^2 


If we subtract times the flrsL column from the second, etc., we get 


I A| = A.J ^ 1 0 • • • . 0 

"^21 A ^21^12 ..... ^21^1k 

^22- Agj^ - -j— 


^ \2 ^ 
^11 ^11 


• • Ai, 


\l^lk 


A ( 1 ) A ( 1 ) 

^2 • • • ♦ ^Ic 


Continuing in this way, we find the value of the determinant 


lAl - Ai,A^^^...A^j5'’ K 


Therefore, the constant we are seeking is 




and the nonnal multivariate p. d. f. la 


fUL e ^ 1J=1 ^ 



At this point we should notice some properties of positive definite qtiadratlc 
forms and matrices. Since lAl - A, ^A^2^ ... IA| Is positive, for each of the 

factors Is a positive constant. Corresponding to each principal minor of || A^j || of order 
h, there Is l quadratic form In h variables. This quadratic form Is again positive def- 
inite. For If there were a set of h t's (not all zero) making this form zero or negative, 


this set and the (k-h) other t's zero' would do the same for 




Since the de- 


terminant of a positive definite matrix is positive, it follows that every principal minor 
is positive. Conversely, if every principal minor is positive the matrix or the quadratic 
form is positive definite, for then each is positive and the above process of re- 
ducing to a sum of squares may be carried out. 

The transformation to the z*a is linear, of the form 

'1 ■ • 

where b^j * 0 for j < 1 . The process we have used proves the theorem that any positive 
definite quadratic form may be "diagonalized" by a real linear transformation. If we 


followed this by the transformation 




we would have reduced the quadratic form to a sum of squares. This last is equivalent to 


'"i - v^ii ^ 


where C 


Now we wish to show that the mean is 


E(Xj^) - 

To do this we differentiate both sides of the following equation with respect to a^: 


^ T T e‘ 


dXj^ - 1. 


da^ differentiation 


of the above equa- 


tion gives U3 
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il 


00 OD 








for 1- 1, 2 , ...« k. !nila glvea ua k homogieneoua linear equatlona In the k unknowna, 
E{Xj-aj). Since the determinant of the coefficient matrix, |A|, la not equal to «ero, 
the only aolutlon to theae equatlona la that all the unknowna he aero. 

E(Xj-aj) - 0. 


E(Xj)-aj, 


j “ 1 »8, . . .,k. 


Next we wish to show that the covariance of and Xj la 

. 1 cofactor of A. ,• II < II 

E((x.-a.)(x.-a.)] - ^ . 

1 1 J J lAI 

To deiDonstrate this we differentiate with respect to A. , both aides of the Identity 


OO OD 1 




I J' 

-OD -00 




dx, . . .dXj^ - |A| ® 


Differentiating, we have 


? ? <5i4-2 -^Za. 4(x.-a,)(x.-a4) 

J J t(^i-)Urai)(x^-aj)] e 2 IJ i 1 J J 


dx^ 


• • • .MM 

-OD -OD 


(. J . |r ..)|A| (cofactor of A^j) - )|A| ^A^^, 


where ><• i If 1 - j, and ■ 0 If 1 ^ j. ^ 

If we multiply both aides of this equation by (t — 1A|® the left hand side la 

Ij" 


E[(xj^-a^;^^j-aj)], 

and the right hand side Is A^^. So we have 
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- 


1 «» ^ ,2 , m m f\C f 


^ ^ ^ ■■ A ^ f ipj "* i^2>*..#k« 


We may summarize as follows: 


2jg2S2SJLMl* — • ^ 2 ^ • • • ^ distributed according to the normal multi - 

variate distribution (b), then E(Xj^) - a^^, - A^^, and OiOjpij ■ A^’^.. 

• Now let us find the 'Joint mai*glnal distribution of x^,X 2 ,..,Xp (r<k). To ao 
this we Integrate out x^^^ , . . . ,Xj^, getting 


S(., - JE,, e- ^ ^Av<V%>W-V) 


We can see this Is true If we recall the procedure used in evaluating If at any stage, 
we had Integrated out the z's, we would have had remaining a normal multivariate distribu- 
tion of the x's. 

We wish to find an expression of the In terms of the ^IJ . We know that the 
value of E[(x^-ay)(Xy-ay) ] Is a'^'^ If found from the original distribution and is if 
found from the marginal distribution. But these two expressions must be equal. Therefore 

A'^y - b'^^ 


Hence, to derive || B^^ll from || Aj^jll we delete from || A *’11 the last k - r rows and columns 
(obtaining IIB^^II ) and take the Inverse of this matrix. 

In particular, suppose r » 1 . We find the distribution of to be 


g(x, ) - -p=r!- e 


^ ^11 ■ 1 ^ 

no- 


where 


B,, - lA 


’l - JAi . 


where 7?^ ^ - cofactor of A^ ^ In A. 
Thus, 


«-2 4II 11 

•^1 *■ A - -nj 


Similar distributions exist for the other x's. 


This result gives us a slip)le method of finding the m. g. f. of (x^-a^), 
(Xg-ag), ..., (Xjj-aj^) defined by 





late law (b), m. £. f. of (x,-a, ), (x 2 -a 2 ),..,(xjj-ajj) 




The argument leading to Theorem (B) may be readily applied to show that any 
r (r^) linearly Independent linear functions of (x^-a^), 1 - 1,2,..., are distributed 
according to a normal r- variate distribution. To show this, let 


Lp - ilpi(xi-ai). 


p— i,2,..,r. 


be the r linearly Independent linear functions, 1. e., such that there exists no set of 

r 

constants ,2, . .>r) not all zero for which ■■ 0, l«l,2,..,k. Let 

P p«i P 

6(6^ ,6^, . . . ,6^) be the m. g. f. of the Lp, 1. e.. 




I r“ r 

^ j. ...y dx^...dXj^ 

-00 

dx, ...dxj^. 


where = ^6plpj^. The value of this Integral la given by (1) with x^-a^ - 1, 


Thus 


6(e^,...,ep) = e 


1 K^BP^e. 


where 


bP'^ = ^A^-5i_.1... 

Now consider the quadratic form 


- p q 

p>q=i i, j“i 


?Vq>' 


If II A^’^ll la positive definite and if Ip^ are linearly independent, then clearly || 
Is positive definite. We therefore have 


Theorem (C) : Let . ,x^ ^ distributed 


to the normal multivariate 


Lp - ) (P=i • *>1^) be 1 inearly independent linear functions 

the X ^ . I . Then Lp are distributed according to the normal r-varlate law 


2 
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1 


*2 p7^1®pqS^q 


dL.^ • • •dljp> 


where II Bpqll it the Inverse of the matrix II B^^ll, and 
Next let U3 find the conditional p. d. f. 


/ ^WqJ • 


f(X^ |X 2 ,...,Xjj) 


g(x 2 ,...,Xj^) » 


where gCx^^.-.^x^) la the marginal distribution of the last k - 1 variables. Using the 
marginal distribution found abo?e^ where now II B^'^ll = II {p,q - 2 ,...,k) and also 
II Bpqll = II A^q^ltwe get 


,, -p ^ A. ax.-a. )(x.-a.: 

VW „ ^ l,j=i ^ ^ J J 


dx^dXg. . .dx^ 


f (x, Ixg, • . . ,Xj^)dx^ 


, -TT / B^„(x -a^)(x„-a„) 

VTbT „ ^1^2 p p' q q' 


dx 2 ,...,dxj^ 


- l|A,,((x,-a,Ki: 




\r^ VlA^q^l e 




^ - i A^,[(Xi-a,) + ^^ (Xp-ap) 


dx^ • 


Therefore, f9r fixed values of X2,...,x^, we have x^ normally distributed with variance 


and mean 


n. 

E(x,|x2,...,Xi^) - a, - ^Za,p(x -a ). 


The regression function for the multivariate normal distribution Is linear. 




3.3 Pearaon Syatem of Diatrlbutlon Punctiona 

Thus far we have dealt with special distributions which arise under certain spec- 
ified conditions. Several attempts have been made to develop a general system of distri- 
butions which can describe or closely approximate the true distribution of a random 


variable. 


equation 


One of these systems derived by Karl Pearson is based upon the differential 


dx V . 


b+cx+dx^ 


Depending on the values given the constants a, b, c, and^ d we get a wide variety of dis- 
tribution functions as solutions of the differential equations. We get J-shaped and 
U-shaped curves, symmetrical and skewed curves, distributions with finite and Infinite 
ranges. 

The normal distribution may be obtained as a solution of the differential equa- 
tion for o*d»o and b < o. This function la Type VII of Pearson's twelve types of solu- 


tions. 


Another special case we shall be Interested In is d » o. Then the equation Is 


Writing this as 


we see the solution la 


Changing the constants, we have 


p: = . 

dx b+cx 


dy dx /ca-b> dx 

y--* <— 


y » Ke°{x+^) 


where K la chosen so K 


u; 




P>o,V>o, 


(x4-a) dx « 1. This Is the Pearson Type III distribution. 


defined for -Qt< x < oo7 


To detennlne K we m^jke tlie indicated Integration. Let 


X +a. 
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K J e'^^(x+a)''‘’clx - K'J e‘'^z''‘’dz. 


where K' »• Therefore we choose K' so 


e'Zz^”^dz. 


This last Integral is an Important fimctlon of the exponent V denoted by 1^(1/ ), the gamma 


function of V. 


To evaluate Hv ) we Integrate by parts, using z as u and e dz as dv. 


nv) = -z^'^e"^ + (V'-l) J e'^z^'^dz =• 0 + -1 )• 


This gives us a recursion or a functional equation forr’(v'). If V Is an integer, 

(b) r(v) - (v-i)(v-2)...2.ir(l). 


Since 


we have for V an integer. 


Ri ) ” I e'^dz -1 , 


Rv) - (v-D! , 

It la also easy to evaluate Rv) if V is an integer plus For 


Hi) - S 


z ^e'^dz 


J e'^^dt - fn , 


and we have 


RN) - (v-i)(v-2)...|4^- 


In general for v > 0 ,r('v) has a finite value, and In any Interval {a,b) of values of 
V{ 0 <a<b), Rv) is continuous. Rv) has a minimum for V •• 1.46165. 



with this determination of K, the Pearson Type III distribution is 

(d) e1' V..)''-' . 

This distribution for the case a « o and p « ^ la known as the X^-dlstrlbutlon with 2V 
dep:ree 3 of f reedom and Is one of the most Important distributions In statistics* It and 
certain applications will be studied Ih detail In Chapter V. 

It will be convenient at this point to find the moment -generating function of 
the distribution (d) when a « 0. We have 

i(e) = E(e«^)=^fe«VPV-’dx 

0 

= = — 5 e'^’®)''[(f5-e)xr’‘’d[(/)-e)x] 

r(v)(/i-er J 
= -i^ . 

ift-e)'' 

Therefore, forp-6) 0, we have 

4 >{e) = (1 - |)'\ 

P.>r ft = we have 

(e) (b(e) = (1-26)"^, 

which ^ m. g. f . for the X^- distribution with ^ degrees of freedom . 

Next let us consider the solution of the differential equation (a) when 
dx^ + cx + b has two real roots, say g and h (g<h), both different from -a. Then, using 
partial fractions we can write the equation as 

^ _ „/_A B_v 

dx d(x-g)(x-h) “ ^ x-g h-x^ ' 

where A and B are functions of g, h, a and d which we do not need to determine. 

The solution of this equation Is 


y = C(x-g)^(h-x)®, 
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where C la a constant of Integration. We wiah to determine C ao that 


1.x 

C^(x-g)^(h-x)®dbc - 1 . 


If ws let X » g + (h-g)v, the integral hecomea 




Because we will need the result later, let us evaluate the Integral, namely 


r n,-l n.-l 
] V ’ (i-v) ^ dv. 


which is known as the Beta Function of n, and n^, B(n^ .n^), we wish to show that this Is 

Hn, 

(f) B(n,,n2)--p; . 

^ ^ Hn^+ng) 

To do this we consider the product' Pcn, ) Rng), where 

f-i r n.~l ~x 

I (n, ) - J X e dx , 

0 

and similarly for Rn,). Letting x - s^, we get 


Rn, ) = 2 ] 3 


2n,-i 2 

' e ^ ds. 


So we can express Rn, Rn^) as the double Integral 


r(n,)n„,) - u ‘ 


t © da dt • 


If we change to polar coordinates 


3 « r C03 6, 

t = r ain e. 


this Integral over the positive quadrant of the at plane becomea 


It 

p 

f r ?n 2 

4 \ \ cos 6 sin e r " c dr de, 

0 0 




thus proving our desired result. 

Therefore, the Type I distribution may be written In the general form 


r(A^B^2)(x-g)^(h-x)^ 
r(A+i)r(B+i){h-g)A+B+’ 


^g^x^). 


There are twelve types of Pearson distributions. Below are graphed several 
representative ones. 


;y y y 

M ^ 



Figure 5 

3.^ The Gram- Chari ier Series 

Another rather general system of distribution functions, known as the Oram- 
Charller Series, is based upon the normal distribution and its derivatives. Instead of a 
number of distributions of different functional forms, this system is composed of an in- 
finite series of terms of a certain kind. Charlier gave a theoretical argument for this 
system from his development of the hypothesis of elementary errors. We shall regard it, 
however, as a distribution which has been found satisfactory for fitting or ^smoothing** 
certain empirical distributions. 



generator of this series Is the Gaussian or normal distribution. Let 


‘o'*’ - ^ 


and let 


^ dx'^ ° 


1 " 


where x' - . Then the OraBi- Chari ler series Is 


(c) f(x) - hQ6o(x) + b,4),(x) + bgtgCx) + ... - 4.Q(x)|bo-b, ^ - U 


'•+ + ••• 


where la Ihe nth Hermlte polynomial 


„n- 2 . n(n~i )(n~2)(n-3) «n-4 




By choosing the a, o", and b*s properly we obtain a wide variety of distribution functions j 
which are asymptotic to the x-ajcls at both ends of the range. 


Since 


uu 

^ f(x) 


dx «• b. 


we choose b^ - 1 . The mean is 


Vi^ 

^ xf (x)dx - crb^ + a. 


If a In the expression for x* Is taken as the mean of the distribution f(x), then b^ » o. 
Taking a a^ the mean of the distribution we find 


J (x-a)^f(x)dx - + 2 o^bg. 


If cr. In the expression for x'. Is chosen 'as the standard deviation of f(x) then bg ■ 0. 
It Is easily found that the third and fourth moments are 

(d) H, = JJo^b- , 


(il, - «r'*(3-UJb^). 


Similarly, hl^er momenta can be found. Equations (d) and (e) and similar ones for higgler 
moments give equations for determining the b*s in terms of momenta. The problem of 
fitting distributions by the use of moments, however, will be discussed in 56.4. 


CHAPTER IV 


SAMPLING THEORY 


4.1 General Remarks 

Suppose X la a random variable with c. d. f . F(x). In accordance with the 
statement made at the end of §2.3> we define a random sample 0^^ of size n of values of x 
from a population with c. d. f. P(x) as a set of n random variables x^ , X 2 ,...,x^ with 
c. d. f. 

(a) P(x, ) . P(X 2 ) 

We note that a random sample consists of statistically Independent random variables all 
having the same c. d. f . It Is often convenient to think of x*^ as the value of x in the 
first "drawing" from the population, Xg as the value of x In the second "drawing", etc. 

In the theory of sampling, we are usually Interested In c.d.f.'s of one or more 
functions of the n random variables comprising the sample. Thus, suppose g(x^ fX^, . . .,X q) 
Is such a sample function (Borel meaaiirable ) . We are Interested In determining the e. d. 
f. of g, 1. e., Pr[g(x^ ,X 2 , . . . ,Xj^) ^ g], the value of which Is obtained by performing the 
Stleltjes Integration 


(b) .••• clP(Xn)» 

where R la the region In the n-dlraenslonal space of the x'a for which g(x.| ,X 2 , . . . ,Xq) ^ g. 

Similarly, If g^(x^ ,X 2 , . . .,x^) (1 - i, 2 ,...,k), k^n, are k Borel measurable 
ftinctlona, we are Interested In determining Pr(gj^(x^ ,Xg, . . .,Xj^) i 8^ (i •" 1>2,...,k)). 

The random variable x may be a vector with r components^ say x^^ ^ 
with c. d. f. F(x^ ^ In this case the sample 0^^ would consist of n random 

vectors (x^<i^,x^a^, . . .,x^a^ ), a *= i,2,...,n, (a total of nk vandom variables) with c. d. f 




r(r) 


). 


Again, the sampling problem Is to determine the c. d. f . of one or more (Borel measurable) 
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functlona of the tik random varlablea Involved. For example, here one may wlah to deter- 
mine the probability theory of auch functlona aa ^ (x^^^-x^^^) • 

tdiere x^^^ •• ^ ^ 1, j ■• i,2,...,r, and other aynmetrlcal functlona. 

Oim^ 

In mathematical statistics one is usually interested in relatively simple sanple 
functions, such as averages, ratios, sums of squares, correlation coefficients, etc. One 
is able to obtain simple expressions for sampling distributions for such functions only in 
certain special cases which will be considered in this and in later chapters. However, 
one is able to obtain moments of some of the simpler g functions such as .averages, average 
sum of squares, etc., under broader conditions. Some of these cases will also be con- 
sidered. 

4.2 Application of Theorems on Mean Values to Siampling Theory 

This section consists of the application of results of 552 . 71 - 2.75 to cases of 
interest in sampling theory. No assumptions are made about the population distribution 
except the existence of first and second moments. 

4.21 Distribution of Sample Mean 

« 

Let be a sample from a population with an arbitrary distribu- 

tion for which the first moment ji* * a exists. Let x be the mean of the sample, 

5c - ^ X. /n. 

1-1 ^ 

Then from equation (b) of 52.74, we have that the expected value of 5c is 

E{x> T ^®l/^ " 


since a^ - E(x^) » a. If furthermore the population distribution P(x) has a finite vari- 
ance o-^, then since each x^ has the c. d. f. P(x^), and the Xj^ are mutually Independent, 
we get from (d) of 52.74 that the variance of x is 

<r| 1 . ^cr^/n^ - <r^/n. 

We gather these results into 

If X is the mean of a eample of size n from a population with 
arbitrary c. d. f. P(x), then If the mean a of P(x) exlats . 

E(x) - a, 

and If P(x) haa finite variance <r®, the variance of x la 

0 -^ - <r^/n. 





unlfonnly to z. 

We make the proof for the case where the m. g. f. i|)(e) of the original distribu- 
tion exists for |e|<h, h>o. Then for |e|<h, the m. g. f. (bCe) of y - (x-a)/o-also exists, 
for i>(e) - e"®®/'’il)(6/(T). Finally, let J(e) be the m. g. f. of z: 


+00 +00 


$(e) - E(e®^) - ^ exp[e^ (x^-a)/ Vnor]dP(x^ )dP(x 2 ). . . .dP(x^) 


-00 -00 


J exp (e(x-a)/ VnaidF(x) - f(|)(e/Vn) 


(t)(u) - (t)(o) + u(t)'(o) + -iu^(t)"(u^ ), 

where O < < u < h If u > O, and -h < u < < 0 if u < o. 4>"(u) la continuous at u - 0, 

hence (l)"(u) = (b"(0) + t^(u), where i)(u) — *• o as u — + o. We recall il)^^^(o) Is the 1— moment 
If y about the origin, so 4)(o) » i, (|>'(o) = 0, <>"(0) = i, and 

2 ^ 

(d) $(e) - 11+ |- [i+n(7i)]l^ 

V n 

where 0 < 6,^ < ^ < hVn or -hlfici < e < 6^ <0. Now choose any 6 and hold It fixed, (d) la 
valid for n )> e^/h^. Letting n — > oo , for every fixed e, 

lim $(e) a e , 


which la the m. g. f. for N(0,1). Therefore from Theorem (C) of § 2 . 91 , the limiting dlatrl- 
butlon of l3 given by (c) above. 

While the above proof baaed on the generating function can be shortened, we have 
purposely given it in a way which permits of generalization to distributions of which it la 
assumed only that the second moment exists. In this general case one employs Instead of 
the m. g. f. the characteristic function ^(t) of the distribution, which la related to the 
generating fxmctlon 4^(e) by $(t) « <^(lt). This always exists for all real t. The argu- 
ment follows the above step by step and at the end one appeals to a theorem analogous to 
(C) of 52 . 91 > which states that if the limit of the characteristic function la the charac- 



IV. SAMPLING THEORY 


terlatic fimctlon of aome contlnuoua c. d. f. P*(x) then the limit of the c. d. f. la 
F*(x) \mlformly for all x. 

4.22 Expected Value of Sample Variance 

For the aample ,x^, . . . ,x^) , call S the sum of aquared deviations from the 

aample mean. 


S - XU.-x)' 


2 —2 
xj-nx . 


Recalling that E la a linear operator, we get 


E(S) « XE(xf) - nE(x^). 


Now If the population distribution F(x) has mean jg, and finite variance cr^. 


E(x^) - [k* of F(x)] - + a^, 

E(x^) = [^1* of c.d.f. of x] - cr| + a^ = a^ + cr^/n. 


E(S) - (n-1 )o-^. 


We note that E(S /n) ^ cr , but If we define 


3^ » S/(n-i), 


E(a^) •= 


4.3 Sampllnja; from a Finite Population 

Suppose that a population has a finite number N of elements, each characterized 
by a number x « 1 = i,2,...,N, and that we draw a random sample Oj^:(x^ ,X 2 ,...,Xj^) 

without replacement. The sample may be represented by a point (x^ ,X 2 , . . . ,Xj^) In n dimen- 
sions, the possible values of being x^^^, x^ ^ . ,x^^^ , a« l,2,...,n. To simplify the 

discussion, let us assume that the values of the x^^^ are distinct, 1 « 1,2,...,N. Then 
Pr(XQ^Xp forotT^p) = 0. Hence we may think of the range of the sample point being all 


points of the lattice 


A , A j 


,x'‘ ',ot* i,2,...,n, but we must ascribe to any 


point for which x^ =» ct^ft, the probability zero. By a random sample we mean that all 
points of this lattice, barring the exceptional points just mentioned, have the same prob- 
ability p. To enumerate the points with probability p, we note that to obtain such a 
point, we may choose in N ways, in N-i ^ays, ..., Xj^ In N-n+i ways. The number of 


points with probability p Is thus N(N-1 )...(R-n-t-i ). Since the total probability of the 
points of the lattice oust add up to unity, we have 

(a) p - [N('N-i)...(N-n+i)r\ 




where 


\ If 


if any two x^are equal. 


all x^ are distinct. 


Define the mean a and the variance of the population from 

a - ^x^^Vh. - ^(x^^^-a)VH. 

1-1 1-1 

Here, we shall consider the problem of determining the mean and variance of the mean of a 
random sample from this population. Let x be the saniple mean, 

X - >~x^/n. 

OmT 

We note that the are not Independent, —It will later be seen that the correlation be- 
tween Xfv and X. la not zero, —but we may nevertheless use the formula (f ) of $2.74, as 
fj 

pointed out there. Thus 

(b) E(x) - X^E(xJ/n. 

a-l 

To calculate E(x,^) we desire the marginal distribution of x^^. Suppose a - i . Then 
Pr(x^-x^^^) la the sum of the probability over all lattice points for which - x^^^, 
that la. It la p times the number of lattice points for idilch x^ - x^^^, and no two of 
x^ ,X 2 , . . .,Xj^ are equal. To compute this number note that we may choose x^ in only one way, 
- x^^\ then in N-i ways, Xg ^ x^^\ then Xj in N-2 ways, x^ ^ x^^^ or Xg, etc.; so 
the desired nvimber Is (N-l )(N-2)...(N-n+1 ). The marginal probability of x^ Is thus seen 
to be 

Pr(x,-x^^^) - p(N-1 )(N-2)...(N-n+1 ) - 1/N 


from (a). We get 


E(x, ) - ix^^)pr(x,-x^^^) - ^x^^Vn - a. 
1-1 1=1 
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ilU3 

Similarly, 


E(xJ - a, a= 

and substituting In (b), we find 

E(x) - a. 

To calculate tr| we use formula (c) of 52.7**, 


(C) <r| - 

Employing again the marginal dlatrlljutlon of we get for the variance of x^^, 

E(x|) - [E(xJ]^ - ^[x<^h^Pr(x«-x^^b - a^ = - a^, 

(d) oa “ <7-. 

To find for ql ^ p, }ne use the joint marginal distribution of Xq^ and x^. To simplify the 
notation, let ot» 1, p « 2 . Then Pr(x^=x^^^,X 2 “X^‘^^;l 7 ^j) is p times the number of points 
for which x^ « x^^^, x^ « x^’^^ ^ x^^^, and no two of ,x^, . . . ,x^ are equal. To enumerate 
these points, note that we may choose x^yX^ In only one way, then x^ in N-2 ways, X|^ in 
N-3 ways, etc. Hence 

Pr(x,-x^^^ Xg-x^J^i^j) - p(N-2)(N-5)...(N-n+l ) = [N(N-1)]‘\ 

®'i'^2Pl2 “ E[(x,-a)(x2-a)] - ^ (x^^^-a)(x^ ^^-a)Pr(x,°x^^^x^-x^ ^ ^ 

= ^ (x^^^-a).^(x^ '^^-a)/tN(N-l )] 

^ jii 

= ^~ (x^^^-a)(-x^^ ^-a) ]/fN{N-l )] 

1 j . 


(x^^^-a)(-x^^^)/[N(N-i )] 


- -X^[(x^^^)^/N - ax^^VN]/(N-1 ) - -o^/{N-l), 


Pl2 “ 


-1/(N-1 ). 


M 
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JkA. 


Likewise, 

(e) 


Paft 


Combining (c), (d), (e), we have 


2_a 


n <r, 


-1/(N-1) if o it f). 


<*<P 


-no^ + 2(l+2+3+...+n-1 )[-1/(N-1 )]o^ 
- no^-n(n-1 )(r^/(N-1 ), 

_2 <r^ /N-n» 

‘^jr " ir (m-)- 


We note that for n ■ N, - o, that is a monotonlc increasing function of N, and that 
as N — » 00 , — » <^/n fpr fixed n. 

4.4 Representative Sampling 

Suppose we have a population tc consisting of k mutually exclusive sub-populations 
rr^, each with c. d. f . Pj^(x), that is, 

P^(x) - Pr(J^x I X fromTTj^). 

If X is drawn at random from Tt'let 

Pj^ - Pr(X from t^). 

To find the c. d. f. of X we may proceed as follows; 

P(x) - Pr(X^x) -^^Pr(X from irj^)»Pr(X^x | X from t^) - ^^Pj^Pi(x). 

Denoting the mean of P(x) by a, and its variance byo-^, we calculate 


1-1 ^ 



where a^^ is the mean of Pj^(x). 



+ a® - ^ x®dP(x) - 

where o-^ la the variance of Pj^(x). This may be written 

0-® - ^Pj^[o^+(aj^-a)®]. 


From $4.21 we have that if 3c la the mean of a sample of size n dra¥m. at random 
from 35 then 

E(x) - a, 

(a) ar| a cr^/n » ^p^[cr^+(a^-a)^ ]/n. 

4,41 Sampling when the p^ are known 

We suppose the probabilities p^ are known (the means a^^ are assumed throu^out 
to be unknown). Let us draw a sample 0^ consisting of the following sub-saa^les : 0^^^ 

(n^ elements f rom tcj ), 0^ ^ ^ (n^ elements from 0^^^ (n^^ elements from r^); - n. 

— — f 1 1 

Call Xp the mean of 0^, and x^ the mean of O' Then 




« ^^E(x^)n^/n = ^^aj^n^/n. 


If we use Xp as an estimate of the mean a of rr, we would like to have 


E(Xr)- a - 


Since we do not know the a^, we require 


L. ^ 

— ^iPl ” - ®l^i^ 

= 1 liil ■*' 


for all a^, and this uniquely determines the n^ as np^. 

If « np^, then 0^^ is called a representative sample from rr. The adveuitages of 
representative sampling over random sampling from tr are implicit in 

Theorem (A): The variance of the mean of a representative sample and the 

. I ^ 


variance <r^ of the mean x of a random sample of the same size have the 
ship ; 


relation- 


1 




the equality holding only when all are equal . 
To prove the theorem, we calculate 


cr| - (nj^/n)^ 


from (a) and the mt:tual Independence of the Now 


-|^ - = o-^/np^. 


Therefore 




Hence (a) of rray he written 


4 “4j, + |;.Pi(ai-a)^/n, 


and t h e t.l * 0 o r o n f o 1 J 0 W3 , 


Jarnpllng when the 0 ^ are also known 

'Ve employ the same notation as in If we use the mean of the saiiQ)le to 

estlm^itcj a, wo hnve ji st seen that the n^ are uniquely determined by the requirement 

E(Xp) = a. 

Suppose however that we use as an estimate of a the statistic 




How shri Id we choose the n^, for fixed n 


in,, 

4_1 -L 


SO that 


E(y) = a, 


and oL Is ruinlm jn (fur the class of statistics satisfying (a) and_ (b))? The method of 


§4.'*i shows that we must take c^ = Pj^. Then 


'y = “ i: 


^p^crf/ni. 
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The problem is now to find the n^^ which minimize (c)- subject to the condition 

that n. Treating the n^ as though they were continuous variables, and following 

1-1 ^ ^ 
the method of Lagrange, we form 


g(n^ ,1X2, • • • ,n^jA ) 


-1 

1-1 


P^cr 


4 1 J- 


1=1 


and set 


We get 


9g/anj = 0, 


-p^<r^/n^ + A = 0 , 


1 “ 


(d) 


^1 “ Pi'^l/^ 


p p 

To evaliaate A sum the equations (d) for 1 - and solve for A'^: 

2 

j=l ^ ^ 


The minimizing n^ are thus 


ni ■= Pj'^j)- 


Putting these back In (c), we find the minimum variance to be 

With the help of the Schwartz Inequality, 



(the equality holding only If the a^ are proportional to the b. ), where we let a.. * p?, 

1/2 ^ ^ ^ ^ 
b^ » p^ o*^, we obtain 

Theorem (A): <r£ < crB , 

■" y 

the equality holding only §11 are equal , 

k.5 Sampling Theory of Order Statistics 

Slmultaneoua Distribution of any k Order Statistics , Suppose 0^:{Xy,x^, 
la a sample of size n from a population with probability element f(x)dx, and that 
x^,X 2 ,...,x^ are arranged In ascending order of magnitude. These ordered values of x will 
be referred to as order statistics , more specifically, x^^ will be called the order 



atatlatlc. Let r^, r^, rj^ be k Integera auch that 1 r, < Tg < ... < ^ n. The 

problem to be conaldered here la that of finding the probability element of x_ , x_ ^ . . . > 

1. e. 

(a) *^p > • • • *^p • 

Let , Ig, Ij, ..., t>6 the 2k+i Intervals 

(b) (-oo,x ),(x ,x +dx ),(x +dx ,x ),(x ,x +dx ),...,(x +dx ,+oo), 

r, r, r, Tg Tg Tg Tg 


and let 


J f(x)dX - 


1 » 1 ,2,..,2k+l 


The problem of finding the probability element (a) la Identical with that of finding the 

probability ( to terms, of order dx„ dx„ . . . dx_ ) that If a sample of n elements la 

^2 ^k 

drawn from a multinomial population with classes I,, Ig, ..., Igj^^, then r,-i elements will 
fall In 1 element In Ig, rg-r^-i elements In I^, i element In Ij^, n - r^^-l ele- 

ments in • It follows from the multinomial law (§5.12) that the probability of such 
a partition Is 


r,-l 1 Tg-r^-i 1 


){l .'(rg-r^-l )J1 


(n-iv-1 )J *^2 ^^5 • •‘^2k+i 


Substituting the values of the q, , and noting that, to within terms of order dx_ , 

^ ^1 


x_ +dx_ 


j f(x)dx « f(x^ )dx^ and \ f(x)dx 

^1 J ^ 

+dx„ 

"^1 "^1 


f {x)dx, 


we have 


^2 


r,-l ?r 


rg-r^-1 op 


’,-1)i(r2-r^-l 


r. ..(n-rj^-1 


(J ’f(x)dx) ’ (J| ^f(x)dx) ***^5 f(x)dx) 


n-is,-! 


f(Xj, )dx^ ...f(x^ )dXj, . 
^1 ^k ^k 
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!nie distribution function (e) has many applications, some of which will now be 
considered briefly. 

U .52 Distribution of Largest (or Smallest) Variate 

In this case k - i, •• n; (e) of §4.51 then becomes the probability element of 
the largest element x^, 

*n 

n(J f(x)dx)"-’f(Xj^)dXjj , 

-00 


a similar expression holding for the probability element of the smallest element. 

‘».55 Distribution of Median 

In this case let the number of elements in the sample be 2n + l . We would 
have k*» 1, rjj-n + i, and (e) of §4 . 51 will be the probability element of the sample 
median x^^^^ . Denoting the median by x, we have 

X 00 


(a) 


( \ f(x)dxA Wx)dx)"f(x)dx. 
-00 X 


The asymptotic distribution of the median for large n may be derived from (a). 
~ 3f . 

If x is the population median then ^ r(x)dx - - 5 . Therefore 

° -CD 


J f(x)dx 

- CX ) 


1 

2 


Tt 00 

+ f f(x)dx and ^f(x)dx 


2 


^ f(x)dx, 


and hence (a) may be written as 

X 

(b) f(x)dx) 2 A(x)dx. 

2 AnJ i 


3f, 


We may write J f(x)dx = fix-x^), where 


min f(x) max f(x) , 

xtl xtl 


and I is the Interval or (XjXq). 

Let Vn(X"Xo) “ 7* becomes 


(gn+1 )i 

2AnJ)2 




vt; o 


)dy/Vn. 


(c) 
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We now choose any value of y, hold It fixed, and let n — > 00 . If f(x) Is continuous at 
X ■ X|^ and ^ 0, then f(XQ+y/Vn) — ^ fC^Q), f — ► ^(Xq), and with the help of Stirling's 

formula for the factorials, we thus get as the limit of (c) as n — » 00 , 


(d) 


bV<y 






where cr^ « i/8 1^. Hence the median x in samples of size 2n + 1 Is asyn 5 )totlcally 

normally distributed with mean and variance i /8n[f (x^ ) ]^. It is of Interest to note 
that this asymptotic distribution depends only on the 3^^ and of the population. 

Example : For the normal distribution 


f(x) 


2o-' 


:(x-a)^ 


VirFcr 


we have x^ = a, f(5^) * i/VStfor Therefore, the variance cr| of x in samples of 
size 2n + 1 from a normal distribution with variance is 'rrc^/kn, approximately. 
It will be recalled from §4.21 that the variance a| of the mean of a sample of size 
2n + 1 is cr^ /f2n+l). Hence, for large samples from a normal population, the mean has 
smaller variance than the median. 


In a similar manner one could treat the problem of finding the saihpllng distri- 
bution of the lower quart lie of a sample (the (n+1 )3t element in rank order in a sample of 
size 4n + J>), and other particular order statistics . 

4.^4 Distribution of Sample Range 

The joint distribution of the largest and smallest values of x in the sample is 
given by (e) of § 4.51 with k = 2, r^ = 1 , r^ = n. We have 

(a) n(n-i)(^ f(x)dx)^ ^f(x^ )f{x^)clx^dx^. 

^1 

To obtain the dis^tributlon of the sample range R, we make the following transformation 


(b) 


Xn - X, - R 


x^ = S 


and Integrate the resulting distribution with respect to S. 

Example ; Suppose x has the rectangular distribution 

, , f’(x) » 1/r, 0 < X < r, 

(c) 

= 0, otherwise. 

We have for (a), 

(^) n(n-i )r“^(x^-x^ )^“‘^dx^ dx^^. 
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Applying transformation (b) and Integrating with respect to S from 0 to r - R, we 
obtain as the probability element of the range In samples of size n from the rect- 
angular distribution 

(e) n(n-1 )r'"R^"^(r-R)clR. 

k .35 Tolerance Limits 

The joint distribution of the smallest and largest values of x In the sample Is 
given by (a) of § 4 . 5 ^. Now suppose we set 


We have 


^n 

^ f(x)dx = u, ^ f(x)dx - V. 


^(x.,x ) d(u,v) ^ 1 

L_J1_ . I ] 1 

a(u,v) a(x^,Xj^) f(x, )-f(Xj^) 


and hence the joint distribution of u and v Is 


(b) n(n-l) v^’^dudv, 

and the region of non-zero probability density Is the triangle bounded by u=0, v»0, u+v-1 . 
The probability element (b) clearly does not depend on the probability density function 
f(x). Integrating with respect to u from o to i-v, we find the probability element of v 


to be 
(c) 


n(n-l ) v^*^( l-v)dv. 


It will be seen that v la the amount of the probability in the distribution f(x) 
Included between x^ and x^^ (or statistically speaking, it is the proportion of the popula- 
tion Included between x^ and x^, 1 , e. between the least and greatest values of a sample 
of size n). Prom expression (c) one can determine the sample size n such that the probabil- 
ity Is € that at least lOpfiJi of the population will be Included between the least and 
greatest value of the sample. Such a value of n would be obtained by solving the following 
equation for n: 


I 

n(n-i ) ^ v^‘ 


■( 1-v)dv - €, 


np^'^ - (n-1 )p^ « 1 - €. 


Example : For € « .95 and ft « . 99 > we find n « 130. Thus, if a sample of 130 
cases is drawn from a population in which the random variable x Is continuous, the 
probability Is .95 that the least and greatest values of x In the sample will include 




9^ 


at least 99% of the population. 

and x^ are examples of tolerance limits . More generally, two fimotlona of 
the sample values, say (x^ , . and L 2 (x^ ,X 2 , . . . ,Xj^), will he called lOOOx dlstrlhu- 
t Ion-free tolerance limits at probability level € If 


r2 

Pr(^ f(x)dx^/J) - €, 


for all possible probability density functions f(x). 

If the functional form of f(x) la known but depends on one or more parameters 
6^ ,6^, . . .,6j^ and If and Lg are such that (f) holds for all possible values of the para- 
meters we shall call and Lg loor^x paramc^.er-free tolerance limits at probability level 


If we denote by u^ , Ug,...,Uj^ the quantities 


f(x)dx, \ f(x)dx, . . . , \ f(x)dx , 


respectively, It Is^asy to verify in a manner similar to our treatment of the distribu- 
tion of u and v, that the probability element of u^,U 2 ,...,u^ Is 


[r^-DiCrg-r^-DJ., 


(ri-iv-l).' 1 2 n-u^- 


• • .-Ufc ’ du, dUg . . .du^. 


a result idilch Is Independent of f(x). The domain over which this density fmctlon Is 
defined Is the region for which ^ 0 (l-l,2,..,k) and £ l. 

4,6 Mean Values of Sample Moments when Sample Values are Grouped; Sheppard 
Corrections 

Suppose that x Is a continuous random variable having probability element f(x)dx, 
and that 0^^ Is a sample from a population having this distribution. Let the x axis be 
divided Into non- overlapping Intervals of equal length 6 , suppose 1^ Is the Interval In- 
cluding the origin, and let h be the x- coordinate of the center of 1^. Denote the Inter- 
vals by ..., I_g, I_^, I^, I^, Ig, ... where the end points of I^ are (h+(l-^)< 5 ,h+(l 4 ^)d ), 

1 ** 2, — X^et 

h+(l-i^)d 

(a) Pi " \ f(x)dx. 


f (x)dx. 


h+( 1-^)4 



the probability associated with If f(x) Is Identically zero outside some finite Inter- 
val there will be only a finite number of non- zero otherwise there will be a conver- 

gent series of p^. Let n^ be the number of x's In 0^ falling Into and let the value 
of each of these x's be replaced by h + Itf, the midpoint of Let be the r-th 
"grouped” moment of the sample, defined as follows 



It will be noted that Is the "grouped" analogue of 



where XpXg,...,Xj^ are the values of x In the saaqple. In fact 
verify that E(M^) - where 




It Is easy to 


(d) 


OD 

j x^f(x)dx. 

-CD 


The problem to be considered here la that of finding where h Is a con- 

tinuous random variable distributed uniformly (1. e. with probability element jKJh) on the 
Interval (- ^6, ^4). For a given 6, the random variables Involved In the grouping problem 
are the n^ and h. The conditional probability law of the n^ given h Is the multinomial 
distribution 


(e) 




^0 



Now we have 




where X denotes simimatlon over all positive Integral or zero values of the n^ such that 
Zn,.n. The m. g. f. of Is 

1 4 i 


(b(e) - E(e ^ ^ e** ^Pdh - - ^ ^ ^Pl® 


6(h+i< y 


r dh. 


If the m. g. f. does not exist then the characteristic function (obtained by replacing e 



by « ) will exist since the are positive and will form a convergent series If there 


Is not a finite number of them. We now have 


,M^) - 4)t(o) -1 J ^p^(h+l«)’'dh. 


Making use of (a) we may write 


h+(l + 


^ f(x)(h+l«)^dx dh. 


- h.(l - l)i 


Setting h + 14 - y, we have 


(Jd + -5) y + -iiJ 


, S , miy'djdy 


(j ( 1 - y - 


00 y + -54 


f(x)y^dx dy. 


-OD y - -jtf 


Interchanging the order of integration, we obtain 


00 X + -;r <5 

r r c 


(k) ^ ^ f(x)y^dy dx - ^ [(x + (x - ]f (x)dx. 


-00 X - -^<5 


In particular, for r - 1, 2, 5, (k) becomes 


e(^m;) - J xf(x)dx - n,', 
-00 

E(^M^) - J (x® + 


E(^M^)-f (x5 + -j- x)f(x)dx . ^ p^». 

-00 


It will be noted that (.M^ - ^), and are unbiased (56.21 ) eatl- 


mates of m'* t»2» K^* The quantities called 


corrections of ^ 


and^M^. Such corrections can be obtained for higher values of r by further use of (k). 
Similarly one can deteunlne Sheppard corrections for grouped momenta about the senile 


mean, as defined by 


,Mj, - ^Xn^t(h+1<) - lUnjCh+jtf )]^ 


4e7 Appendix on Lagrange^ a Multipliers 


We frequently encounter the problem of finding the extreme (maximum or minimum) 


value of a function subject to side conditions 

(a) (t)j^(x^ , • . • ,Xj^) 0, 


1 ■■ lyes* ylC ^ n* 


To Insure the Independence of the conditions (a) we assume that for some x^^ 


d ( y* * • ) 


d (^<n * * * 


at the extremum. To simplify the notation, assume n^»l, i-i,...,k. At the extremum, dg-0. 


1^4 *^1 “ 


1=1 °"1 

where dx^,,..,dxj^ are functions of determined by dij -o, 1, e., 

(c) ^ dX4 = 0, j ■ 

1=1 ®*1 ^ 

and dXj^^^ , . . . ,dXj^ are completely arbitrary numbers. In order that (b) be satisfied for 
all dx^ , . . . ,dJCj^, which are arbitrary except that they must satisfy (c), a necessary and 
sufficient condition la that the equation (b) be a linear combination of equations (c), 
1. e., that for some 


L 3<1>4 


i ■■ l,...,n. 


We see that the conditions (d) are obtained if we employ the following rule: To minimize 
g subject to (a), form the function 




and set 


i * l,...,n* 


The equations (a) and (e) constitute a system of n+k equations in n+k unknowns x^,...,3cj^; 

, . . . ,A^, For an extremum it is necessary that x^ , . . . ,x^ satisfy these equations . In 
most applications in statistics the question of sufficiency can be settled in an obvious 



CHAPTER V 


SAMPLING FROM A NORMAL POPCLATION 


Since the normal distribution appears In such a wide variety of problems, we 
shall consider in detail certain sampling problems from such a distribution. Many distri- 
butions are lnT»ortant In statistics for the reason that they arise In connection with 
sampling from a normal universe. In the present chapter, we shall only consider certain 
sampling problems, deriving certain sampling distribution. The application of these saa^p- 
llng problems to problems of significance tests, statistical estimation, etc., will be 
made in later chapters. 

9.1 Distribution of Sample Mean 

An Important property of the normal distribution is the so-called reproductive 
property. We wish to demonstrate that a linear function of normally distributed variates 
la again normally distributed. Suppose x^, Xg, ..., x^ are distributed Independently 
according to N(a^,<j^), N(ag,o^), ..., N(a^,'3^), respectively. Let us find the distribution 
of the linear form L ■ l^x^ + l^Xg + ... + Ij^x^. According to the results of §2.74, the 
expected value of L Is 


(a) E(l,x^+lgXg+...+l^Xj^) - 


liE(x,) 


+ IgECXg) 


InE(Xj^) = 


l^a^ 


^2^2 


1 a . 
•^n n* 


The joint distribution of the x'a Is 
(b) e 

(2iT) 


(Xi-a,)2 (x^-a^)2 

n 4 . . . + 




Prom this we shall find the moment generating function of the linear form minus Its mean, 
L - E(L), 




1 


[Q FROM A NORMAL POPDLATIOM 


(c) <)(e) - 

, (x--a, ) 

(» op - q [ „ +...+ ^ ^ * ®[li(*i*a^+...+ln(Xn"an)J 

— ^ ^ 


(2Tr) “OO -00 


1 

„ f -1 ^ 2 — + 

A-^ \ e i c 

1=1 1^01 4 


n \ 


1r 1 1 ,2 1_2»2n2 

f - 2^ 5= J + 2°1« ^1 

\ e ^ dx4 


^ (Vcr^l^l 
2 


Thl3 la the moment generating fxmctlon for the probability element 

(d) 4=- ® ^Ay, 


where 




Therefore L is distributed according to N(^ ) . We have the 

1 1 1 1 1 1 

Theorem (A) ; If , x^, . - . ,Xj^ are Independently distributed according t 
N(a^,or^), NCa^^ol), » * * .N(a^<3^). respectively , then any linear function of the x*s 
l^x^ + IgX^ + , . • + la. distributed according to 

N( Alj^aj^,Alicr^). 


From this result we can easily derive the distribution of the mean of a sample. 
Consider a sample, 0^, of n observations x^ ,X 2 , . . . ,x^. The x*3 are Independently distri- 
buted each according to N(a,<r^). If we take 1., « “ n^ linear form L Is 

simply X, the mean of the sample. Its expected value Is 



100 
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(e) 

Its variance Is 

(f) 






+ •ia - a; 




IJ 


Therefore, we have the following corollary to Theorem (A): 

If On ■ '*2' * * * '*n — — sample from the normal population 

N(a,e^), then the saniple mean x ^ distributed according to N(a,^). 

;.1l Distribution of Difference between Two Sample Means 
Suppose we have two samples, 0^^ and O^i, of n and n' observations drawn from 
normal populations, N(a,<r®) and N(a',<r'^), respectively. Then the two sample means, x 
and x', are distributed according to N(a, a^/n) and N(a', respectively. To find 

the distribution of the difference of the two means, let us consider the linear function 
X - x'. In this case 1^ «■ i, 1^ »= -ij so the expected value of the linear form Is 


(a) 

and Its variance Is 

(b) 


er' 


We therefore have the following corollary to Theorem (A): 

Corollary (A.,); If 0^^ : x^,X 2 ,...,Xj^ 0^^, ; x* xj,, ....x^t are samples from 

the populations N(a,cr^) and N(a',<r'^). respectively, then x - x' is distributed according 

2 ,2 _ _ 

to N(a-a',^ + ■^t) where x and x' are the means of 0^^ and 0^^,. respectively . 

5.12 Joint Distribution of Means In Samples from a Normal Bivariate Distri- 
bution 

Let us consider a sample “■■l>2,..,n) from the bivariate distribu- 


tion 
(a) 
where A 


0 - i aj) 




2TT 


O-gCl-P®) ’ O-^CT-gd-p-^) 


r-, and A = lA, ,| 

1-p2\ ij 


Let X 


l“n 1- 1>2. 


We wish to determine the joint distribution of x^ , 


Xg. To do this, we determine the m. g. f. of the (x,-a^) and (Xg-a^), 1. e. 




4)(e,,62) - E(e ’ ) 


/VA.n f ( ^ ^0-1 C¥‘^ ^ J®' J otal l-r ^ 

" '2iT' y ••') ® “'ll 


dxgi.-.dxi^dxgn 


-00 

2n-fold 


i 2 6 

A. .(x.-a. ){x,-a J + Xjr(Xi-a.) 

r f ‘ 1.1-1 ^ ^ J J l-l” ^ ^ 


■m\ \> 


dxidXgj”. 


-00 -00 


But, we know from (d) and (e) of §5.22 that if we set *1 * •• 7 i inside [ ] the result- 


ing expression Inside [ ] will be 


> A^^e. 

ifFi K 


Therefore, the m. g. f. of (x^ -a,), and (Xg-ag) Is 


'I)(e,,e2) - e ^ 


1 ^ A^J 




2^1 i J 

Since e Is the ra. g. f. for (x^-a^) end (Xg-Sg) In distribution (b) §3.22, it 

follows that the distribution of (x^-a, ), (Xg-ag) (having m, g. f. (c)) is 


§ ^^A^j(x^-ai)(Xj-aj) 


dx^jChCg. 


We therefore have 

Theorem O : If and are distributed 


to the normal bivar- 


iate law (a) 55.12, then If x^ and x^ are sample means of the x^^ and the x^^, respectively^ 
In a sample 0j^(x^^,l»l,2;a=i,2,..,n) from such a distribution then x^ and x^ are also 
distributed according to a normal bivariate distribution given by (d). 

Theorem (B extends at once to the case of means In a sample from a k-varlate nor- 
mal population with distribution (b) 55 . 25 . The distribution of the means In this case Is 


' 2 k 


S.g The X -distribution 


The X^-dlatrlbutlon function with m degrees of freedom Is defined as 


- 


^2 f - 1 

sHf) 






This distribution arises very frequently In connection with sampling theory of quadratic 

forms of normally distributed variables. We shall consider some of the Important cases 

In this chapter and others In Chapters VIII |nd IX. 

The Integrals - fni(%^)dx^ and ^^f_(x^)dx.^ are tabulated In many places for 
*** R ™ 

various values of m and Xg'? Vi/hen we let x “ t, the latter Integral Is transformed 
Into the Incomplete Ganma Function of which extensive tables have been computed by Bari 
Pearson. 

;.2l Distribution of Sum of Squares of Normally and Independently Distributed 
Variables 

The simplest sample statistic which is distributed according to the x^*law is 
the sum of squares of variates Independently distributed according to the same normal law 
with zero mean. Let us use the method of moment generating functions to find the distri- 


bution of xf 




where each (i- i,2,...,n) is Independently distributed according 


to N(0, 1). The joint distribution of the x's Is 


- 


(2TTr/" 


Now let us find the moment generating function of x. . 


*(e) - E(e ’ ) \ \ e ’ ’ c 

/p 

(2TX) ' ‘ -aD -cx) 


( ^ ^ . . . 5 ^ 

[?rT) -Qo -00 


00 - •l(i-2e)ix| 


dx, ...dXj^ 





f, ^ L- 

1-1 W ^ 


- (-==)^ - (1-26) 

Vi-ae 

for 6 < |. 

But this l3 the moment generating fimctlon of the Pearson Type III distribution 
((e) of §3.5) when fJ - ot- 0, and Therefore by uniqueness Theorem (B), §2.81. 

we have 

— ®n • *1 ’*2' * • — - sample from N(o,i ), .the f met Ion 

xf - Is distributed according to the X^-law with n degrees of freedom . 1. e. 


il201_ 
" Rf) 


— - 1 12 
2\2 - -ix 


e " d(x'^). 


From this result It follows that, If x,,X 2 ,...,x^ are distributed Independently according 

to N(a,<r^), then x^ (x.-a)^/o^ Is distributed accoi^ilng to f„(x^)d(x^). 

’ 2 

We can readily determine the moments of the x distribution from l^s moment 
generating fmctlon. We expand (l)(e) In a power series 


n 2 |(§+i)...(p+h-l) h 

t>(©) - 1 + §.20 + ^ (2.e)^+,..+ ^ r-r-^ (26)^ +. 


Then we find the moments about ?ero 


E[(X.^)^] = ^ 

ae^ 


2^ *2(1+1 )...(|+h-1 ). 


The mean Is n and the varlaxki^ is 


cr' = n(n+2) - = pn. 


^.2? Diatrll)i tlon oi Uie Ij^pc n erit in a Multivariate Normal Distribution 

Now let us consider Uf im-.,l multivariate distribution of k variates with zero 


means 




€mcl let us find the distribution of the quadratic form, ^ j^i^ J * this we find 

the moment generating function of the quadratic form. 


(b) (l)(0) - E(e 




v ] 7 r ? f- '2^ 

-00 "*-00 ^ 


dx^ 


00 00 - 4 ^ (l- 2 »)A^ 

jasr f 5 . = , « ‘ J 

-00 “OO 


dx^ . . .dXj^ 


It follows from §3 ,23 that 


i ... \ ® ‘^1 ••• 


-00 -00 


^ _ (2Tr)^/^ . 

^ VTbi 


the above Integration yields 


VT^ V lAI 

Vl('-2S)Aijl ViT^Vili 


That la. 


)^e) - (i-ge) 


(K-i), 


which, as will be seen from (e) in §5.5, is the m. g. f. for a xf distribution with k 
degrees of freedom. 


We therefore have 


Theorem (A): If are distributed 

^kZ* ’ ^ 'i 

(a), then A^^ .x^x . = x. aay ^ is distribu 


to the normal multivar- 
iing to 


iate law (a), then _> A. ^x.x. =• xj aay . is distributed according to fif(x )• 

l,j“i ^ •' k_ 

More Renerally, the Quadratic form I> A4 ^(Xa ”a4 )(x 4-a4 ) from the distribution 


- i^Aij(Xi-ai)(Xj-a.) 



haa the diatribution with^k degrees of freedom. 


>.23 Reproductive Property of x "Distribution 


In the same way that the normal distribution possesses the reproductive prop- 
erty so also does the x^-dlstrlbutlon. Suppose we have distributed accord- 
ing to fjjj respectively. Prom the joint distribution of 

^ 2 *iC -k ^ 


these variates, let us find the moment generating function of the sum assuming 

1-1 

Independence, 

4x1 ? ( 4 ^^ . . a . 

0 0 ^ 


UJ UJ 

5.J 




"tl 

k 

- [( 1 - 20 ) 2 ] 


- ( 1 - 26 ) 

where m - i>(0) Is the m. g. f. for a distribution with m degrees of freedom. 

IPherefore, we have the following 

I£ ^2' • • • independently distributed according to xf- 

laws with m^ . m^....,ii^^ degrees of freedom respectively, then Is distributed ac- 

cording to a x.^-law with > m, degrees of freedom. 

1 ^ 

9.24 Cochran's Theorem 

Cochran's theorem states certain conditions under which a set of quadratic forms 
are Independently distributed according to T^f-laws if the variables of the quadratic forms 
are Independently distributed, each according to N(o,l ). To prove this theorem, we need 
several algebraic theorems which will be stated as lemmas. 

^SS-Jr* If ^ Is S Quadratic form . ^ of order n and rank r, there 

exists a linear transformation ^ l>g^Xp(a~i , 2 , . . . ,r) such that ^ 

OnZn . where the c^ are +1 or -1 . 

In 55.23 we exhibited a linear transformation that would do this for a positive 
definite quadratic fora. The reader may extend that demonstration to prove Lemma i .* 


*A proof of Lenma 1 is given in M. Bdcher, Introduction to Higher Algebra . Maomlllan, 
New' York, 1907. 




formation 


Lemma 2 : If > A .x x_ la tranafoimed Into ]> aJ.rtZ„2„by a linear trana- 
■■ n •“ 5^1 « n 5^1 “ P 




then 




This lemma can be readily verified frcmthe fact that a^^ - ^ using 


the rule for multiplying determinants. 

Lemma 3 : Suppose we have k quadratic forms , ^ Xg, . . . ,x^ of 

ranks n-, n^,..,n. , respectively , and suppose ±0, - ±^l Then a necessary and suffl- 
^ ^ ^ i«i aai n 


cient condition that there exist a 

(a“l #2,...,^ n. ) such that 
1 ^ 


linear transformation z 




2 2 
q, « z, 4 ... + z^^, 


^ “ ^n- •I'. . • 






is that n - n^ + + ... + • 

' Proof ; The necessity condition is obvious since must be equal to n in 

1 

order for the transformation to be non-singular. 

Now consider the sufficiency condition. We assume n « n^ + n^ + ... + n^^. By 
Lenma i there la a linear transformation y^i ^ « such that q- « CQ,(y^l^ 

where c^j ** +l or -i. In the same way we know there exist transformations 




such that 


n,+n2 


a-n, +1 


•■•> Ik - 




a « n ^ + . • •+^-‘1 


/ i \ f 1 ^ 

In other words we have linear forms ' - J> (a=l , . . . ,n^ ) such that - 

3^^c„(^^bil^Xn - ,^CQ,(y|J we have n- linear forms. 3uch that 

a-i /W ^ I’ o»i ^ 


n^+n^ 




aan^+1 


a-n^n 


Let us denote y^^ ^ by z^^ for a « i ,2, . . .,n^ , by z^ for a « n^+l , . . ., n^ + n^, etc. 

Let us denote b^j^ by for ot « i,...,n^ (p-i,...,n), by for a - 


n^ ^ # • • • > 



n^+iig (()-l,...,n), etc. Combining all of the llneai’ transfoiroatlona, we may write 




(q." 


Then q- 


' 2 


n n 


qg - etc., and YxJ - ^ - Yca^a - 

cx««n^+1 a»1 I*! a»l a“1 f}«1 




By Lemma 2, * ICapT (where Is 1 If a - p and Is 0 If o yip). This 


reduces to 


Aca).|c^p|^ 


Since the c„ - +1, this equation la 


1 - ±1 . Ic 


oip ' 


and because the c^^ are real Ic^^pl ■ +1. 

This fact tells us that the n linear forma are Independent and constitute a 
non-singular linear transformation. Prom the identity 

Xxi =Xoa?l 

we deduce that Xc^z^ Is positive definite since Is positive definite. Hence, each 

Cjj ■ +1. This proves the sufficiency of the condition n » n^ + ng + ... + nj^. It la 
interesting to observe that Ic^pl - +1 and that - 6 ^, that Is, the tranafonna- 

tlon Is orthogonal . 

Cochran's theorem follows readily from this algebraic theorem. 

Theorem (A) (Cochran's Theorem) ; If x^(a»i, 2 .....n) are Independently dlatrl- 

n k 

buted according ^ N( 0, 1 ) and ^ ^x^ - ^ Qi where q. Is a quadratic form of rank n,, a 

a»i 1=1 i 1 1 “ 

necessary and sufficient condition that the ^ distributed according to fn^ (x^) Is 


that ^ n£ “ n. 


Proof : Assume n^^ = n, and find the m. g. f. of the q^^. We have 


(t « E( 6 


^Vi 


CO 00 


( r>Tr\^f^ ^ ^ 


^x|+^eiqi ^ 


ndx„ 


-CD -QD 


Now transform tlie x *3 to z's by Lerm*- " 5 , iiutlnp th^t the Jacobian is unity. 



^ J i ® ' 1 

) • • • * 


-00 -00 


r," 


-26.) ^ , 


which l 3 the ra. g. f. of k independent distributions with n^ ,n 2 , . . . ,nj^ degrees of free- 
dom, thus establishing the sufficiency condition. 

The converse assumes that 

V ? f 

l^Ci-sSj) ® \ \© 

1=1 ^ (2Tt)^/^ J... J ^ 

-OD -00 

X 

Since VQi * 2L^<x9 rl^t hand side of the equation becomes the m. g. f. of > x« 

1fer a»i a«i 


(which has ax distribution) when 6^ - 6^ 


6j^ - 6. So the equation becomes 


^ ^ - — 
^(i-ae) ^ - (1-ae) ® , 


that is. 




Hence 




:i-26) 


and the theorem is proved. 


( 1 - 26 ) . 


5.25 Independence of Mean and Svim of Squared Deviations from Mean in Samples 
From a Normal Population 

As an application of Cochran’s Theorem, we shall show that the sample mean and 
sum of squares of deviations about the mean in a sample from a normal population are inde- 
pendent and have %^- distributions. Consider a sample Oj^:x^ ,X 2 , . . . drawn from a normal 
population N( 0, 1 ) . Then 


= y'(x^-x)^ +i>x® 

a«s1 




> 


Let 



and 


q. l8 of mnlc i, for In the matrix 


n 


1 

1 

n • 

• • ii 


1 

n • 

• • n 




any minor of order ttro. 


n 

n 

± 

± 

n 

n 


Is zero, but each element Is different from zero. The determinant of the matrix of q^ Is 



^ 1 

1 


i 

n 

n 

' n ••• 


n 


1 -H 

1 



n 

n ••• 


n 

• 

• 

• • • • 

1 

• 

1 - 


n 

n 

■ n ••• 

n 

each 

of the others, we get 


1 

n 

n 

1 

n ••• 

- 

n 

- 1 

1 

0 • • • 


0 

- 1 

0 

1 


0 

. 

• 

• • • • 


• 

- 1 

0 

0 • • . 


1 

first 

and find 




. n 
n 

1 

n 

1 

n •••• 

1 

n 


0 

1 

0 • • • • 

0 


0 

0 

1 • • • • 

0 

- 0 

• 

• 

• • • • • 

• 


0 

0 

0 • • • • 

1 



for all the elements of the first column are zero. If we use this method of evaluation on 
any principal minor of order n - i , we get 
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Hence the rank of la n - l. Using Cochran's Theo^rem we conclude that and 

nx^ are Independently distributed according to respectively. 

If X la distributed according to N(a,o^) then la distributed according to 


N(0,l). Hence, we have proved the following corollary to Cochran's Theorem: 


If Oj^:x^ , . .,Xj^ ^ a sample from N(a,<j^), then 


^and ni^ 


are inde- 


dlatrlbuted 


to snd It also follows that s' 


- and X are Independently distributed . 

It should be pointed out that one could establish the fact 


that 


for a sample from N(a,<r^) are Independently distributed according to 
and N{0,1 ), respectively, by verifying that the m. g. f. 


|kx^-x 


- E(e 


or2 ^ CT 


n-1 .ie^ 


) - (1-26- 


^ ^«3 The ’’Student" Distribution 

Next we ahall derive the dlstrlhution of the ratio of two Independent variates, 
one normally distributed and the other distributed according to the -x.^- law. Let be a 
variate distributed according to N(o,l) and let be distributed accordj^ to fjji(x^)* 
these are Independently distributed, the joint probability element Is 


. l! )2 ‘ ^ .1x2 

m * infr * ’■ 


Let us change variables to 




-00 < t < 00 , 

0 < U < CD . 


Then 5 




- u. 
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The Jacobian of this trahsfonnatlon Is 






Hence the Joint distribution of t and u Is 


. nv m % ' 2 ' Y m 


siEfni) '2 


To find the marginal distribution of t, we integrate out u. 




00 El /L- 1 N M 

(^)2 ‘ 2 e “ ^ du 


P - 3±i 

2Viv«r’(|) i “ 


S + oir 


e “ + 1 ) du, 


PtI t= - ®r 


^(t) Is called the "Student" t-dlstrlbutlon with m degrees of freedom. Values of t^ have 


•been tatulated such that 


g^(t)dt - €, 


for C ■ *51 »7 f *9. *95# .9®/ *99 and in 2^ In 

R. A. Fisher's Statistical Methods for Research Workers . 

The application of this distribution to sampling theory Is Imnedlate. As an 
Important application consider a sample Oj^ from N(a,or®). Then 


5 _ (x-a)i<n 


Is distributed according to N(o,l) and 



Is Independently distributed according to ratio 



Is, therefore, distributed according to 

The q\iantlty t and Its sampling theory which marked a new step in statistical 
Inference were first Investigated by Gossett who without rigorously proving his result 
suggested the above distribution of t in a paper published in 1908 under the name of 
"Student". A rigorous proof was supplied by R. A. Plsher In 1926. The essential feature 
of t Is that both It and its distribution are functionally independent of <r. 

The "Student" distribution may also be used In connection with two samples. 

Let Oj^ l» 2 ,...,n, ) and Oj^ <*- 1,2,...,n2) be samples from N(a^,<r®) and 

N(ag,o^), respectively. Let and Xg be sample means and s® - and 

am^ 


(x^-XghCa^-ag) 



Is distributed according to N( 0, i ) and 


^g (n^-Ds^ + (ng-i)s2 


Is distributed Independently according to f_ Hence, the ratio 



Is distributed according to g„ „(t). 


It can b© verified by the reader that 




5 . 4 Snedecor * a P- Dlatrlbut Ion 

Now let us consider the distribution of the ratio of two quantities indepen- 
dently distributed according to ')^^-dlstributlons. Let and')(^g be Independently distri- 
buted according to f„^(x®) ®nd respectively. The Joint distribution is 


-. J . (ZL)S 

nji, " 


1 (?i2)2 


2l 




e 


Let ua make the change of variables 


'x|- 


0 < P < OD , 

0 < V < 00 . 


P ®1 P 

= V - F Xe - V , 


and the Jacobian of the tranaformatlon is 


m- m- 

J « V :^F 

“2 “a 


The distribution of the transformed variates is 


m. rt m. m. m. . m. 

2 ’ 2“'''2 ^ “ a^mT 

ST P (?) e ^ 


Integrating out the extraneous vai^iable v, we get the distribution of P, 


m- ^ -1 

<i^i 


(’ + =r F) 
"2 


m,+m2 f 


P-[ m, , -(u::ip)v 

J ((’ + Sr ^ e “2 2 


IHg * '2 


jLiL 
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(a) 






m. m. 

) m, ^ ^ - 1 m, - 
^ (=r) P (1 + sr P) 

'in„' lUg 


m, +in^ 


This distribution, known as Snedecor's P- distribution with and degrees of freedom, 
will be denoted by m 

Values of have been tabulated such that 


1 ' 




for € - . 99 , .95 and all combinations of (m^,m2) from (1,1) to (1?,30) and for certain 
combinations from (H,32) to (500,1000), In Snedecor's Statistical Methods . 

The moments about zero are easily obtained. Since the above Is a distribution 
function, the Integral over the entire range of P Is unity, and, hence. 


op r, r.+r r, „ r r. 

r 1 .. 1 n, . _L^ p (^)P (^) n, 1 
p2 (i+-lp) 2 dP = — (-2)2 


r(V£2 


) 


Using this fact, we get by Integmtlon, 

r 


(b) 


E(P^) 


n^) 


CD 

— )^ \p^ 


- 1 + r 


m - 

" ♦is:"'’ 


m^+m^ 


dF 


n m +m m r-- m r- m 

-V^) ” 1 / 1 


m 


f i » m - 'm- * 




c ' '2 

ITl- i — j 111- 


n?- . r)r(? - r) 


un,| ' ’ 


for r < ^. 

By a simple change of variable the P- distribution may be clianged into a Type I 
distribution, (the Integrand of the Beta function times a constant). Let 


3o hjjj ju (P)dF transforms into 


®2 


F > ^ X 

m, 1 - X ' 


J. dx 
“l (1 - x )^ 


(n\ i_L_2_J_/^^2 _ 2 -^-T 2 “ ’ ^ dx _ 1 

B (^^) 


^1 

X® (1-x)® dx . 


It should be pointed out that the square of Student's t is slaiply distributed as 


h,^^(t2)d(t2). 


If we malce the change of variable 


Z - -i loggP , 


we obtain R. A. Piaher’s z- distribution. 


Example i : As an example of the applications of the P-dlatrlbutlon, consider 
two aanqplea 0^ :(x^^,ca - and 0^^ ^ • • • 9 ^ 2 ^ from jjopulations 

N(a^,a^) and N(a 2 ,<^, respectively. Let 


.2 f _g ^(x^^ 


^ - 




“1 / „2 g 

^ / 2 - 
/ »l ■! "1 


distributed according to h^^ -i,n 


Example 2 ; Suppose 0^ , 0^ ,...,0 are k samples fiaiiN( a^ N(a 2 ,<r 

N(aj^,<r^), respectively. Then ^ ^ 


^ (n^-1 )s^ 







( 2'tT)^ J J o la 2a 

-CD •••-CO 


n 00 00 ^ 

“ 1 \ ® ^rTdx, dx„ , 

( 2 'n)“ J ... J « ’« ^ 


-00 -00 


where 




n 00 (D 1 




IAIM ^ 2^-^ "ij,ap^*la “j' 


-00 -00 


ml 

1 • 


The determinant of thj? matrix of the quadratic form j order 


2n and is 


C D D ... D 

D C D ... D 

^lj,ap| “ D D C ... D 


D D D ... C 


where C la a 2 x 2 block of elements as follows: 


^11 “ ^12 ” ^® 12 ^’ “ n ^ 

^21 " ^22 " ^® 22 ^^ " n ^ 


and D is a 2 X 2 block of elements as follows: 


H®n H®12 


^ e ^ ft 

n ^21 n ^22 


If in IB. 


ij>ap 


I the first row of elements is subtracted from the third, fifth, etc., and 


the second row Is subtracted from the fourth, sixth, etc., and If In the resulting deter- 
minant to the first column la added the third, fifth, etc., and to the second column la 
added the fourth, sixth, etc., we find that 


• i*ij - . 


which exists If the e^^j are sufficiently small. Hence the m. g. f. of the a^^, a^g and 


2a^g Is 


2ll - 

<>(611.612.622) - lA^jl 2 lAi_ 3 - 26 ij| 2 


Now If we can find a function f (a^ ^ .a^g.agg) such that 


Ze. .a. 


<>(611,612,622) - in e ^•5f(aii,ai2.a22)daiida,2‘^22» 


where R la the region In the space of the a^^j for which an >0, > 0 , -1 < . 

<1, then f(aii,ai2.ag2) will b® the p. d. f. of the The uniqueness of the solution 

can be argued from the multivariate analotrue of Theorem (B) of §2.81. 

Denoting lA^^jl by and Aj^j - 26^^^ by and choosing values of 

the small enougji for ll^ijll to be positive definite, we can write 


n-1 . n-1 1 , n-i > n-i 

l\jl' " -^l' " *22 ^ ^ > 


S J ' 
1^22 


and we can expand (l-k®)"^"^/® Into the Infinite series ti / P'dln- 1)/2 [rtn-i)/2 + l)/i.']k®^. 

Hence we may write 

• 1) 


« 1 00 a-, 2 

T '^“2” * ^> ( ^ 2 > 

■ \ n^ii . i) ' 

0 ^ 


+ 1-1 


and a similar expression holds for K, 


-8n-i)/2 + 1) 


. Therefore, we may write 





n~3 . ii a _ il 


^ (®11*22)^Tr2i 1 . 

4 A "“"'“'Tr""'*"" Aa M I fl>a a 8»^ 


0 0* ' 2 


A 2 . 2^11*11 ^^22^22 1-0 5 ^.T22^ 

n^)* ‘ ’ • ' irsiTT, 


If In t ] we make use of the formula IJ - feDrtfr/ 2 ®^(l + -g) we may write [ ] as 

(a^g)^ Hi + 1) 


^ VTr«2l)! P + 1 ) 


But from the definition of the Beta f\mctlon, $3«3, 


Ri -i) 


1 ( 

Sii+1) R^i^) J 


1 - J- 

t ^(i-t) ® dt 


Therefore 






i 1-0 


( V a a )^^7i 

[ Va,^agg) A,g r2i(l-rg) g dr 


■(^) ^ ^0 


( )<a,,a )’^aJg 1 2v" 


Sli r r^lT^ T Sli 

rJ(l-r2) 2 dr - - ■ ^ ■. e ’’ 2 

VWRS^) .I 


since teims for odd values of J vanish upon Integration. Making use of this value of [ ] 
In (e) we have 


00 00 1 


(S) 1 2*^22 ^ 


r r ..*2,1 .n- 1 „ , n-t 


0 0-1 


2 " 2 


f i^i i®n ■ i^22®a2+^^^22^l 2^^ ^da^gdr 


Setting r VaTia^ “ expressed as (b) where 


A ) a« i I “ "rt ja. . 

f(ai1>a,g«ag3) ■ ^n-lAn-2 ® ^ iJ. 

11 12 22 gh l^pjn^jp,!^) 
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As we mentioned earlier, the uniqueness of this p. d. f. may be argued from, the multivar- 
iate analogue of Theorem (B) 52.ftl. ' 

The sampling distribution of the correlation coefficient r may be found by set- 
ting a^g - r Va^^agg In (h), expanding e^ VanaiaAig infinite series, and Inte- 

grating with respect to a^ ^ and aggj we obtain as the probability element of r 


( 1 ) 


f(r)dr 


(1-p^) ° (1-r^) ^ 

fTTn^)n^) 



where p •» “(A., 2 / Va^T^^’ correlation coefficient of the population. 
If p » 0, the distribution of r Is simply 


(j) 


f(r)dr 




(1-r^) ^ dr. 


The distribution (h) may be generalized to the case of a sample from a k-varlate 
normal distribution given by (b) In §5.23. The distribution for the k-varlate case, 
which will be derived In Chapter XI, Is 


(k.) 


f(a,j) = 



9 


,where x^^(l » a - l,2,.,.,n) being the sample. 

Clearly, n > k for this distribution to exist. 

This Is a very In^wrtant distribution function and la fundamental In the theory 
of normal multivariate statistical analysis. It is known as the wishart distribution . 


5.6 Independence of Second Order Momenta and Means In Samples from a yormal 

Multivariate Distribution 

In § 5.25 It was 3 ho»m that In samples of size n from a noimal distribution 
N(a,o-2), quantities (x„-x)^ and were Independently distributed accord 

Ing to fj^_i( 7 ^) and N(o, 1 ), respectively. 

In the case of samples of size n from the k-varlate normal distribution (b). 





53 . 85 , the two seta of quantities ^ J " 1 . 2, . . .,10 and Xj,(l - 1 , 2 ) 

are Independently distributed according to (k), 55.5, and (e), 55.1^ respectively. A 
straight forward method of establishing the Independence of the two systems is by evalu- 
ating the characteristic function of a^^j^ and 2aj^j and (Xj^-aj^j\^: 

k k _ 

•K^lj,®!) - E(e ’ * ’ ), 


where which tuma out to be a product of the form 

- nil 

. 2 i . 2 ^1 


A lAjj- 29jjl 


" . 9 ' 


CHAPTER VI 


ON THE THEORY OF STATISTICAL ESTIMATION 

Let Oj^ be a sample from a population whose c.d.f. depends on h parameters 6^, 
Suppose the fimctlonal form of the c.d.f. la known, but the true values of 
the ijarameters are unknown. A fundamental problem In the theory of statistical estimation 
Is the following; On the b&sls of the evidence of 0^^, can we assign an Interval for one 

of the parameters, say , and. then state with a given amount of confidence (the meaning 

of this phrase will have to be defined) that the true value of lies in this Interval? 
More generally, can we make similar statements regarding a subset of the parameters, say 
and a region In the parameter space These problems are discussed In 

56.1. If Instead of assigning on the basis of 0^ an Interval of values in which we esti- 

mate the true parameter value to be contained, we wish to assign a single value, the prob- 
lem Is more difficult: We can hardly hope that our **polnt estimate** will coincide exactly 
with the true value; In what sense can such an estimate be said to be **good**? How can 
**good** estimates be found? These questions are considered In §6.2. Closely related to 
the problem of point estimation of one or more parameters are questions of curve fitting; 
these are taken up In §6.4. 

The problems described above may be called parametric problems In statistical 
estimation. There are also non-parametrlc cases of statistical estimation. One of these 
Is the problem of tolerance limits, which may be formulated as follows: Suppose a sample 
0^ is from a population In which the random variable x Is continuous. Can we determine 
functions and Lg of the x*8 In the sample such that we can state with a given probab- 

ility that ^oo(^%6f the x*s In the population will be Included In the interval (L, ,Lg), 
no matter what the population distribution Is? or no matter what the values of the 
parameters are If the functional form of the distribution Is known? This problem Is 
discussed in 56.3* Some of the underlying sampling theory Is discussed In § 4 . 55 . 

6.1 Confidence Intervals and Confidence Regions 

In this section we consider the estimation of one or more parameters by means of 
statements that the parameter lies, or the parameters lie. In a certain region of the para- 
meter space. The discussion of the example of § 6.11 should be carefully studied: while 
this will not be repeated elsewhere, the analogous considerations pertain In every case 
taken up In § 56 . 11 - 6 . 13 * 
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6,11 Case In which the Dlatrlbutlon Depends on Only One Parameter 
It will be clearest If we begin by means of an exaii 5 )le (range of a rectangular 
distribution): Let R be the range of a sample 0^^ from a population with the p. d. f. 

f(x;6) « 1/6, when o ^ x ^ 6, and o, otherwise. 

It has been shown in §4.54 that the p, d, f. of R is 

fj^(R;6) - n(n-1 )6“^^“^(e-'R), o ^ R ^ 6. 

If we introduce the f met ion 

il; « R/e, 

we find that the distribution of this function of sample and parameter is independent of 
the true value of the parameter , its p. d. f . is 

g(ili) = n(n-l 

We pick a positive number € < i (it is customary to take € » *95 or .99) and define 
from 

1 

5 g(iD)dtp = €. 


Then regardless of the true value of 6, 

€ « Pr(i|/^^(1/^1 ) « Pr(il;^;^R/6;^i ), 

which is equivalent to the statement 

(a) Pr(R£e^R/tD^) - €. 

It should be noted that R is the random variable in this statement and not e. The inter- 
val (S :(R,R/i1i£) is called a confidence interval for e, and € is called the confidence coef- 
ficient. Let us examine the significance of the probability statement (a): 

First of all, (a) does not mean that if we take the value of R from a specific 
sample, say R - R^ , that the probability that 

(b) Ri ^ ^ Ri/'l'e 

is €: For, e la not a rsindom variable, it is a constant, even if unknown, and hence the 
statement (b) is true or false; if (b) is true the probability is unity, and if false, 
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zero, -- In no case Is It €. The situation is analogous to the random drawing (with re- 
placement) of balls from the classical um, in which the proportion of white balls is €, 
of black balls, 1 - €. After we have drawn a ball the randomness of the process is over, 
the particular ball drawn is either black or white, and probability statements, aside from 
the trivial one that p » 0 or 1 , are no longer possible. However, if we draw a large num- 
ber of balls we may expect that the percentage of white balls drawn will closely approxi- 
mate 100 €. More precisely: The law of large numbers (§3*11) tells us that the propor- 
tion of white balls drawn converges stochastically to € as the number of drawings is in- 
creased. 

We now see the practical significance of the probability statement (a): If we 
always use confidence coefficient € and always assert that the true value of the parameter 
6 (it need not always be the same parameter) lies in the interval obtained by putting the 
sample values into the confidence interval, then in the long run (1. e. in repeated samp- 
ling) the percentage of correct statements can be expected to be very close to 100 €. 

Again more precisely, we should say that the probability that the proportion of correct 
statements departs from € by more than a fixed amount h )> 0, approaches zero as the number 
of statements (i.e. number of samples) is Increased, no matter how small h. 

In general, if a distribution depends on one parameter 6, and if we have two 
functions 5(0^) and ^(0^^) which depend on the sample 0^ but not on e, so that the Interval 

<5(0n) : e(Ojj) ^ e ^ e(Oj^) 

is a random Interval, then if the probability that the random Interval 6 cover the true 
value of the parameter la f 


PrietdCO^)! « €, 

whatever be the true value 6, we call <^(0^^) a confidence Interval for 6, and € the confi - 
dence coefficient . We shall sometimes refer to the pair e, 6 of random variables as 
confidence limits . This terminology is due to Neyman. 

The method of finding confidence intervals that was employed in the example is 
worth noting: It depends on finding a function ^ of 0^ and 6 whose distribution is inde- 
pendent of 6. If the function V la monotone and continuous in e, then the relation 

(c) Pr(i|/,^ ^ .1. ^ = € 


can be Inverted to read 
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Pr(eU(0^)) - €, 


where <*(0^^) Is the confidence Interval. Another perhaps more direct method of deter mining 
confidence Intervals la as follows: 

Suppose T(x^ , . . .,Xj^) la a function of a sample 0^ : (x^ jXg, . . . from a popu- 
lation with distribution element f(x;e)dx, such that the probability element of T Is 
g(T;e)dT. Suppose the range* of values of T having non-zero probability density Is {a,b), 
Eind suppose the range of possible values of e is {a, ft). Suppose two continuous monotone 
Increasing functions T^(e) and T^(e) exist such that 


(d) 


g(T;e)dT 


b 

^ g{T;e)dT 


P(l-€), 


q(1-€), 


where p and q are positive such that p + q - l. Assume that g(T;e) Is such that and 
each ranges from a to b as e ranges from a to ft. Then for a given value of T, let 6 
and e be the values of e for which T^(e) - T, T^(e) - T, respectively. Then (e,e) Is a 
confidence Interval for 6 with confidence coefficient €. That (e,e) Is a confidence Inter- 
val follows from the relation 

Pr(Tg(e) ^ T ^ T^(e)) = € , 

which, because of the continuous monotonlc character of Tg(e) and T^(e), may be Inverted 
and written as Pr{e ^ 6 ^ 6) = €. It should be noted that we may obtain confidence limits 
for each value of p if functions Tg(e) and T^(e) of the required kind exist for each p. 

The question arises as to which value of p Is "best". This would depend, of course, on 
what definition of "best" we choose. In those cases where the mean value of the length of 
the confidence Interval Is a function which factors In the form h^(p)h 2 (e), conmon sense 
suggests that we should choose p so that the mean length la a minimum. In the case of 
large samples, the definition of "best" confidence Intervals la fairly direct (see (6.12). 
We may represent confidence Intervals obtained by this process, graphically as 

follows : 


*We penult a or a to be -oo , b or /J to be +oo 
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Suppose the true value of 6 Is For any sample value of T, the corresponding confi- 

dence interval is fonned as follows: Draw a line parallel to the 6- axis, defined by 
T - sample value. Let B be the points of intersection of this line with the two 
curves as indicated in Fig. . The confidence Interval is the projection of the segment 
AB on the e-axis. The confidence interval will cover the true value of 6^ if and only if 
the segment AB crosses the line e = that is if and only if T falls in the range 
T^(eo), probability of T falling in this interval is precisely €. We 

thus have Pr(e ^ ^ ® discussion and conclusion hold for any 6^ in the range 

(a,P). 

This method, for example,’ has been applied by R, A, Fisher to the problem of 
determining the confidence limits for p from the distribution (1) § 5,5 of the correlation 
coefficient r. Fisher uses the term fiducial limits instead of confidence limits'?^ 

The idea Involved in this method has also been applied to cases where T is a 
discrete random variable to obtain approximate confidence limits for the. parameter in- 
volved, In this case the « signs in the analogue of (d) for the discrete case are replaced 
placed by ^ signs, and the largest value of T^(6) and smallest value of T^(6) are obtained 
satisfying the inequalities. T^(6) and T^(6) will be step-functions and the approximate 
confidence limits are obtained by drawing a smooth curve through the graphs of the step- 
functions, For example, Clopper and Pearson ( Biometrika , Vol, 26 pp, 

have applied the method to the problem of determining approximate confidence limits for 
the binomial probability parameter p from the statistic | in the binomial distribution 
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C (x— 0, 1 , 2, . . . ,n) , and Ricker ( Journal of the American Statistical Associat ion . 

^ ^ X 

Vol. 32 (1937), pp. 349 - 356 ) has applied the method to the Polaaon dlatributlon ^ e'’^> 

where m la the parameter and x the statistic. A method of detemliilng confidence limits 

for 6 from large samples based on the likelihood fimctlon is given In §6.12. 

6.12 Confidence Limits from Large Samples 

Suppose X has c. d. f. F(x,6), where e Is a parameter. Let 0^^ be a sample of 
size n from a population having this c. d. f. Let P(0^,e) be the likelihood function, 

1. e. 

(a) P(0^,e) = , 

“ 1=1 

# 

where f(x,6)l3 the p. d. f. If x Is a continuous variable, and Is simply probability If 
X la a discrete variable. 

We recall the first method of obtaining confidence Intervals given In §6.11, 
which depends on finding a function of 0^ and e whose distribution Is Independent of e. 
That a function of the desired type for large 'samples may be obtained from the likelihood 
function P(0j^,e) may be concluded by use of the central limit Theorem (C) of §4.21. The 
central limit theorem applies to a sum (the average), so we replace the product In (a) by 
a sura by taking logarithms : 

(b) log P(0 ,e) = 

1=1 ^ 

where y^ = log f(x^,6) may be regarded as a random variable for any fixed 6. To apply the 
central limit theorem we need E(y) and o^, where y = log f(x,e). Now 

+00 

(c) E(y) = J log f(x,e)d^F(x,e), 

-no 

where (i^F(x,6) = f(x,6)dx In the continuous case, and the Integral (c) becomes a sum In th 
discrete case. Tlie calculation (c) does not give a simple result, but It Is clear tliat If 
we employed z = dy/ae, then 

+00 

E(z) = ^ ^ log f(x,e)d^P(x,e) , 

-00 

and In the continuous case this becomes 

+ao 

E{z) = J ^ f(x,e)dx. 

-00 


♦If X Is discrete, f(x.6) = F(x.6) - F(x-0,6) 
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If the order of integration and differentiation may be interchanged, 

(d) E(z) - 0. 

2 2 

Let U 3 now assume (d) to be true in any case and furthermore that A » E(z ) Is finite. 
Differentiating (b) we get 

and hence 

Applying the central limit theorem to z we have that 
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Pr(e < 6 ^ e) » €. 


The asymptotic confidence Intervals (f ) furnished by — ^ are optimum In 

VnA 

the following sense; The mean value of (— ^ ^ " ) I ^ is greater than that of €iny 

ae ^ 06 ^ 

other function ^( 0 ^) of the sample which has N(o,l) as its limiting distribution . 

This maximum property of the mean squared rate of change with respect to 6 implies short- 
est average confidence intervals in a certain sense, since confidence intervals are ob- 
tained by taking the Inverse of — L. ^ 2 -P .E P with respect to 6. 

VnA 06 

Example : Suppose samples of size n are drawn from a population having the bi- 
nomial distribution 


f(x,p) = dF(x,p) - p^(l-p)’'^. 


X » 0, 1 • 


In a sample of size n 


P(0j^,p) = p 


where n. 




n, n-n, 

P ^ 1 -p) \ 


We verify that E (0 log f/6p) - o euid calculate 


^ ’ dp ’ ' p(l-p) ' 

and U_ a log P . (Il . 

^ ’-P 

n. 

_ n, - np - p) YF 

VUVpd-p) Vp(i - p) 

Therefore, to find approximate confidence limits with confidence coefficient €, we 
Invert the expression 


- P)V^ 

Pr(-d. ^ ^ ^ d.) = € 

V^1 - P) ^ 


•For proof for the case urtiere G(Oj^) la of the form ^h(Xj^,e) where G(O^) la aaymptotl- 
cally distributed according to N(0,1), see S. S. Wllia, Annala of Math . Stat . . Vol. 9 
(1938), pp. 166-175, and for more general results, see A. Wald, Annala of Math . Stat., 
Vol. 15, (I9‘t2), pp. 127-137. 
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no 


obtaining 

Pr(E i p ^ p) “ €, 

^1 2 2 2 

where p and p are given as the roots of the quadratic — p) n » d^(p - p ). 

6.13 Confidence Intervals In the Case where the Ddstrlbutlon Depends on 

Several Parameters 

Suppose that the c. d. f. of the population depends on x)arametera 6^ ,6^, . . . 
and we wish to estimate 6^. If there exist functions 6^(0^^), of the sample, such 

that the probability that the random Interval 

«(0n): ei(On)^ e^Oi(On), 

cover the true value of e^ does not depend on the true values of 

Pr |6^t(J(0j^) i = €, Independent of 6^ ,6^, . . . ,6^^, 

then we say that <5(0^) Is a confidence Interval for 6^ with confidence coefficient €. 

(The parameters ..,6^ are sometimes called nuisance parameters . ) 

Example 1 (Mean of a normal population) ; If 0^ Is a seunple from a population 
with distribution N(a,cr^), then In the notation of §5.5, 

t - V^(X“a)/3 

has the t-dlstrlbutlon gj^^^(t) with n-1 degrees of freedom. Define t^ from 



Then 


€ = Pr(-t^ £ ^ ^ “ ?r(x-t^3/ ^ a ^ x+t^s/Vli), 

whatever be the true values of a and cr^. Hence (x-t^s/ Vn, x+t^s/ Yn) Is a confidence 
Interval for a with confidence coefficient €. 


Example 2 (Difference of means of two normal populations Tmown to have the same 


variance ) : Let 0, 
Let 


: (x^^ . . . ,x^^ ) be a sample of size n^ from N(a^, 






n. 






a — a ^ “a 


2- 


d ^ 5r ■] f 
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Then by § 5.25 has the -j^^^-diatrllnitlon with n^-l dt^grees of freodom, 

(§ 5 . 25 ) S/o-^, where S - S, + S^, has the T^^-dlatrlbution with n. + rf, - ? 
of freedom. Furthermore, y= (d-a)/(CT' (n^ + 11 ^ )] ' has the diatrlhuLion 
and since y and S/o-^ are statistically Independent (§'?.P‘3), it follows fn 
that cry/ [ 3/(n^ +n^-2 ) ) ' has the t-dlatrlbutlcn with ri^ + n„ dcgreoa rif 
gn^+n^- 2 (t). Defining from 


hence 
degrees 
N(0,1 ), 
m 

1 reedom. 


-t, 




(t)dt - €, 


we find by the method of Example i that a confidence interval for a Is (d- 
d+t^s'), where 

1 

(n^4n^)S 


t^3’. 


(n^+n,. 


')n^n^ 


Example ^ (Variance of a normal distribution) : Let 0^ be a sample from N(a,cr ). 


Let 


1=1 ^ 


X = Xx 4 /n. 

i=1 ^ 


Then (§ 5 . 25 ) 3/cr^ has the distribution with n- 1 degrees of freedom. Let 

p p ’ 

”^€2 points on the range (0,00 ) such that 

p 

2 

f'n-dxOdX,'’ = €. 


We find that ^ confidence Interval for <r^ with confidence coefflcl- 

ent €. 

Example ^4 ( Ratio of the variances of two normal distributions ) ; Let 0 „ : 

P n^ 

(Xi^ ,Xi^, . . . ,Xi^ ) he a sample of size from N(a^,cr^), 1=1,2. Let 


2 ^ 

P' 


(Xj .-Xi)V(n^-i ), X^ = ^Xj^ ./nj^, 


P y P 

T = 3 ^/ 33 , 


2 / 2 
e - a-^ /rr-^. 


2 2 

Since (n^^-l )3^ for 1=1,2, are independently distributed according to -)(_ “distributions 

with nj,-1 degrees of freedom respectively, it follows from §5.4 that T/6 has the 
F-distrlbution h^ ^ ^■i(F) with n^-1 and degrees of freedom. Pick a pair of 

limits F^^, Fp^ ^30 that 


since S. and S are statistically independent 
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(F)dP 


P 


1€ 


€. 


Then a confidence interval for 6 I3 (T/P^^, T/P^^). 

6.14 Confidence ’Heglona 

We suppose that the population distribution depends on parameters 6^ ,6^, . • . 

We denote the parameter point (6^ . . . ,6j^) by 0, and the entire h-dlmensional space of 
admissible parameter values by II . If Is a random region in O which depends on the 

sample 0^, but not on the unknown parameter point 0, and If the probability that the ran- 
dom region <5(0^) cover the true parameter point 0 Is independent of 0, 


Pr i0t.(5(O^) I ~ €, Independent of 0, 


then we say that <5(0^) is a confidence region for ©, with confidence coefficient € • 

It may be desired to estimate only a subset h, of the h para- 

meters (the remaining parameters are called nuisance parameters ) . Denote the m-dlmen- 
slonal space of 0’ : ( 6 ^ , 6 ^, . . . , 6 ^) by A* . If Is a random region In A’ such that 


Pr|0't(5’(Oj^)| = €, Independent of 0, 


whatever be the true value 0, then Is said to be a confidence region for 0 ’ with 

confidence coefficient €. 

Example ; Suppose 0^ and 0^^ are samples from normal populations N(a^,a-^) and 
respectively. W^ know f^om § 5.25 that S^/o-^ (defined In Example 2 , 

§ 6 . 15 ), 

^1 (x^^a^ ) 


are Independently distributed according to 7 ^^- laws with n^-1, n^-l, 1 , 1 degrees of 
freedom respectively. By § 5.23 It follows* that 


3, + n^(x,-a, + nAx^-a. f 

and ^ : 

O' cr 

are Independently distributed according to ^(^^-laws with n^ + n^ -*2 and 2 degrees of 
freedom respectively. Hence If we set 


p „ ni(Xi-a,)^ ng(x3-ag)^ 

Si + Sg " ’ 


then F Is distributed according to h ' where n - n. + n^. 

c f n'“ ^ 12 
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Therefore If la chosen ao that 

0 


we may say that 


n, (x,-a, )^+n (X -a 


which Is equivalent to the statement that 

Pr|(a,,a^)t<5(0j^;! - € , 

where <5(0^) Is the region In the (a, ^a^) plane bounded by the random ellipse with 
equation 

- 2 - 2 2(St+S ) 

n,(x,-a,) + n 2 (x 2 -a^) ° Fg - . 


In other words, the probability Is € that this ellipse will cover the true para- 
meter point (a^,a^). 

6.2 Point Estimation: Maximum Likelihood Statistics 

Througtiout this section we consider the point estimation of a pareuneter e In the 
c. d. f. of a population.* There may be other \anknown parameters present. If so, we de- 
note these by 6^,e^, , . . ,6^. A statistic Is any function T(O^) of the sample, not depend- 
ing on 6, or on any other parameters If such are present. Point estimation consists of 
the use of a single statistic for estimating the parameter; confidence Intervals, we re- 
call, Involve two statistics, the end-points of the confidence Interval, satisfying cer- 
tain conditions (§6.1). Desirable conditions for statistics used as point estimates have 
been given by R. A. Fisher: An optimum estimate satisfies the criteria of consistency, 
efficiency, and sufficiency, defined below. A method which sometimes yields optimum sta- 
tistics Is Fisher's method of maximum likelihood. 

6.21 Consistency 

A statistic T(Oj^) Is said to be a consistent estimate of 6 If T converges stoclv- 
astlcally (§4.21 ) to e as n — » oo , From Theorem (B) of §4.21 we know that whenever the 
population has finite variance, the sample mean Is a consistent estimate of the population 
mean. We remark that consistency Is purely an asymptotic property. If for every n, 

E(T) =* e, then we say that the statistic T la unbiased . It follows from Theorem (A) of 
§4.21 that the sample mean Is always an unbiased estimate of the population mean (when- 
ever the latter exists). The following theorem enables us to recognize the consistency 
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11) for any other atatlatlc T'(O^) auch that Vir(T'-e) la aaymptotlcally dlatrlbuted 
according to N(0,(i')» n ^ (i'. 

Since the asymptotic meein and variance of T are e and ^»./n, respectively. It follows from 
Theorem (A) of §6.21 that (1) Implies the consistency of T. The efficiency of T' In esti- 
mating 0 Is defined by E - 

p 

Example : Consider the sample mean x and the sample median x of 0^^ from N(a,<r ) 
as estimates of a* We have from that Vrr(x-a) Is distributed according to 

N(0,a^), and from § 4.53 that Vn(5!-a) la asymptotically dlatrlbuted according to 
N(0,~Tror^). Hence x Is more efficient than 5f. However, to prove x **efflclent** It 
would be necessary to verify condition (11) of the definition. This example may be 
generalized as follows: If 0^^ la from a population with p, d, f. f(x), if the pop- 
ulation median = a (the population mean), and If f(x) Is continuous at a, then using 
the results of §^^.53 on the asymptotic distribution of x, we find that x Is a more 
efficient estimate of a than x If o- < [2f(a)]”^x Is more efficient If o- > [2f(a))”l 

6.23 Sufficiency 

T la said to be a sufficient statistic for estimating 6 If for any other atatla- 
tlc T\ the conditional distribution f(T*|T) of T*, given T, Is Independent of 6. (We use 
the same notation f(TMT) whether the population Is continuous or discrete.) Thus, ex- 
pected values, moments and other probability calculatlcnf^ about T*, given T, will be calcu- 
lated from f(T’tT) and hence will not depend on e, but they will depend on T In general. 

Or, In Fisher’s terminology a sufficient statistic **exhau3t3 the Information** In a sample. 
We note that sufficiency, unlike consistency and efficiency. Is not merely an asymptotic 
property. 

A convenient method of spotting sufficient statistics Is embodied In 

Theorem (A) : If the population distribution Is continuous ^ let P(0j^;e,e2, . . ,6^) 
be the p. d. f . of 0^^; If the distribution Is discrete , let P(0^;e,eg, . . . ,6^^) be the dis - 
crete probability of 0^^. In either case a necessary and sufficient condition that T a 
sufficient statistic for estimating 6 ^ that the function P_ factor In the following 
manner 


P( ^ ^ ^ ^h ^ ** (T ^ 2 ^ ^ ^h ^ 82 ^ ^ ^ > • • • ^ • 

A sufficient set of statistics with regard to a set of parameters may be defined* 
and an analogue of Theorem (A) obtained for that case; see Neyraan and Pearson, Statistical 
Research Memoirs. Vol. 1 (1956), pp. 119 - 121 . 


♦For proof, see J. Neyman, Glomale dell Istltuto I ta llano degll Attuarl , Vol. 6 ( 193 ^), 
pp. 320-53^^. 
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Example 1 : Suppose 0^ la from N(a,(r‘‘). Then 


P(0^;a,<r ) — e 


2 - In - -ls/<r2 

•(2ircr‘^) ^ e , 


where 


1=1 ^ 


(Here and In the following examples the factors corresponding to g.| and g^ of Theorem 
(A) are separated by a dot.) Hence x Is a sufficient statistic for estimating a. 

p 

In this case there Is no sufficient statistic for <r but It Is easily shown that 
x,3/(n-i) are a sufficient set of (unbiased) statistics for a and a". 

Example g : For 0^^ from N(0,6), 

- -|n - Is'/e 

P(0j^;e) = (^-rre) ^ e ^ -i, 




where S' « i^i^l* Hence 3* Is a sufficient statistic for estimating e. 3* /n Is an 
unbiased (see ex. 2 , §6.24) sufficient statistic. 

Example 3 : Suppose the population has the discrete distribution 


p(x;e) = e ®/x j , 


0)1^2^... 


We recall from §5.13 that E(x) = e. For a sample 0^:(x^,x^,,. 

F(0^^;e) = • 

1=1 ^ 


If we write this 


P(0^;e) = (e^--' e )-(i/nx ’), 
^ 1=1 ^ 




we see that ^—^1 ^ sufficient statistic for estimating 6. 3 


Since 


n __ 

K( X f ) = ue, 
1=1 ^ 


n 

V 

t=r 


It follows that X = .^x,/n is an unbiased sufficient statistic. 

1=1 ^ 

b.p4 Maxlmiiin id^ e1 ih ood E stimates 

The function P( e,^, . . . ,6^^ ) defined In Theorem (A) of § 6 . 25 , when considered 

as a function of the patnjnoter point e,e^, . . . for fixed 0^, is called the likelihood 
of the parameter point. If the 1 il^el ilK^od function P has a \uiique maximiun at 6 = 

0^ = 6^(0^), . . . ,e|j = t>hen tlie set (d’ statistics is called the maximum 

llkel ihoo d ost imale of the parameter point. 
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Let U3 consider the case of one parameter , say e. In Theorem (A), §6.12, It was 

shown that under certain conditions the quantity — ^ ^ Is asymptotically distributed 

VnA 

according to N(o,l ). Let us assume that 6 Is the value of 6 which maximizes P and that we 
can make the following expansion about e = e, 


1 / a loc: Pn 


-( " ^ ( V . xu a-. r ) (^.g) ^ 


VnA a 6 


v nA ae 


=7 


VnA ae"" 


VnA a 6" 




-AV UV + 


where 6 Is on the Interval (6,6), and 


* ^ d«‘ * 


v = VF(e- 8 ), w = lj(i^^)g. 


We have employed the fact that ap/de vanishes for 6=6. Now from Theorem (A), §6.12, 


- 


VnA ae 


where 


Making use of (a) we may write 


- tx" 


Pr( -AV 


*w*^wv=‘<a).-^5e = 


Considering U, V, W as three random variables, the left side of (c) states that the pix>b- 
ablllty of U, V, W falling into a certain region in this space Is given by the expression 

p 

on the right. Now let us assume (l) that ^ converges stochastically to 

lo g- which we shall assume = -A^ (Implying that U converges stochastically to 0) 
96 ^ 3 

as n — >00, (2) that (J- ^ ^- ■§ -^ )y(and hence 2AW) converges stochastically to some finite 
• ^ 96 ^ ^ 

number K, and (3) that V has some limiting non- degenerate p. d. f. as n — ► 00 (!• e,, 
has a c. d. f. which Is continuous), then the limiting form of the distribution function In 
the If, V, W space as n — >00 Is a one-dlmenslonal p. d. f . along the straight line 


U = 0 


W = K . 
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The p. d. f. on this line is that of the limiting distribution of V. Hence, 

Llm Pr( -AV + UV + < d) = Lira Pr( -AV < d). 

n — ► 00 Vn n — » oo 

2 

The equality of the two expreealona for A la a reaaonable aaaumption aa the reader will 
aee from the following dlacuaalon: 

(d) 

Differentiating this with respect to 6, we get 

(e) f f , df a log f ^ a^f ^ 

a 6^ ae ae ae^ 

Substituting (d) Into (e), and Integrating with respect to x from -oo to +oo , we have 


^ E((Oog_f 

de ae 


TVJ 


Now if we may Interchange the order of Integration and differentiation in the right momber, 
then the left member la seen to be equal to 

2 1® 2 
-^ \ f djc = -^ (1 ) = 0. 

ae" ae" 

-00 

We may summarize in the following 

Theorem (A) : Let Oj^(x^ ,x,^, . . . ,Xj^) b^ a sample from a population with c. d. f. 
P(x;6). Let P(O^) * j|A^f(x^;e) ^ the likelihood function , where f(x;e) 1_3 the p. d. f. 

^ It continuous variable and probability of x ^ x ^s discrete . Let P ( 0^^ ; 9 ) have a 
unique maximum at e = e, and assume 


E[(iU0£_f)2j ^ .E(ili«£U) = A- , 


and that n — > oo 

2 

(11) convej-ges s tocliastlcnlly to 

ae 


( ^ a^locr p » 

T — converges storlias tlcall y t^ a finite K, 

ae^ " 

Vrr(e-*e) has a limiting non- degenerate p. d. f. 
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Then Vn^(€~e) la distributed 


to N(0,^). 


Under fairly general conditions, which will not he given here, it can ho shown 
that if 6 la any other statistic such that Vn(e'e) Is as^inplotically distributed accord- 

Ing to N(0,B^), then ^ 

In the present case where the c. d. f. F(x;e) depends on only one parameter it 
Is often possible to transform from the old par/imeter 6 to a new parameter (t 3( that the 
asymptotic variance of 6, the maximum likelihood estimate of (J> will be independent of tlie 

p 

parameter. Let A be defined aa before, let (t> = h(e), a function to bo detei’mined, and 


define 




We will try to determine the function h(e) so that B‘ is a given positive constant. We 




)=’) - 


# - A 
de “ 

= g ^ Ade + C , 

where C is any arbitrary constant. If the last equation detennlnes (t as a monotonlc con- 
tinuous function of 6, then since P(0j^;e) iias a unique maximum for 6 = e, clearly 
P(0n;h ^(<t))) has a unique maximum for ({) = J = h(e). By Theorem (A) the asymptotic vari- 
ance of <t), the maximum likelihood estimate of 6, will be which is Independent of 

([). As an Illustration the reader can verify that in Example 2 below, (b « log 6 is a new 
parameter of the desired type. 

Theorem (A), §6.12, and Theorem (A) of the present section can both be extended to 
the case of several parameters . In the case of several parameters it may be shown under 
conditions analogous to those in Theorem (A) that for large n Vn’(e^-^), Vn(e^-e^ ) , . . .^ 
Yrr(6h-6j^) are asymptotically distributed according to a noinna] multivariate distribution 
with variance” covariance matrix | lo-^o-jp^ jl I given by ||A^.ir\ where 


A,j = El 


d loe f d los 


-) = -E( 


Exa mple 1 : Suppose 0 Is from N(a,l): 
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- ^ - |(x-a)^ 

f(x;a) - (2Tt) , 


log f = - 1 log ( 2 Tr) - -i(x-a)^. 


6 log f/aa = X - a. 


If we use the first of tlie two expressions for we get 


= E[ (x-a)^] = cr^ = i . 


To use the second expression we would have to take the expected value of 


-a^log f/da" = 1 , 


and we note we get the same result. To find a we Inspect 

- in - i[n(x-a)^+S] 

P(0j^;a) = {27T) " e ^ 

and see that this is maxlniuin when the exponent is mlnlmuin, that is for 


Theorem (A) says that Vn(x-a) is asymptotically distributed according to N(0^1), 
In the present case this is the exact distribution. 


Example p : For 0^^ from N(0,6), 


f(x;6) = (2tf6) "" e 


I - 


log f = - i log (2tt) - i log e - ix^/e, 
d log f/de = i(-i/e + x^/e^) , 

2 

Let U3 see whetner it may not be easier to calculate A from the other formula: 

a^iog f/de^ = ^"^-x^/e^, 

■= -E{ie'^-x^/e^) = - + E(x^/e)/e^. 


P ? 

Since X /a has the ^^'-distribution with k - 1 degrees of freedom, its mean is k =1 
Hence 

+ 6-2 « ^- 2 ^ 
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Now 


where S' 


P(0^;e) 


- In - in - | 3 '/e 
(2tt) e e , 


^x. Differentiating 
1=1 ^ 


log p = - I log ( 2 it) - -in log e - -^S'/e 


with respect to 6, we get 

dP/ae = ip-(-n/e+3'/e^). 


Equating this to zero and solving for e, we find 

e = S'/n. 

^ o 

By Theorem (A), Vir(6-6) la asymptotically distributed according to N(0,2e ). Since 
S'/^ actually has the -xf- distribution with n degrees of freedom, Its exact mean and 
variance are n and 2n, respectively; hence the asymptotic mean end variance given by 
Theorem (A) turn out to be the exact mean and variance. However, thet^xact dlstrl- 
butlon of ne/e la the -distribution with n degrees of freedom, and not a normal 
distribution. 


Example 3 : As an Illustration of the method of obtaining maximum likelihood 
estimates when the distribution Is discrete, consider again the sample of Example 3# 
§ 6 . 25 . We may write 

P(0„i9) - 0, 


where 



u = 1 / A X. ; 
1=1 ^ 


are Independent of 6. To find 6 we setdP/66 = o and solve for 6: 


log P = nx log 6 - ne + log U, 
dP/ae = P-(nx /6 - n) = o, 

6 = X. 

This we have already shown to be* an unbiased sufficient statistic. We calculate 


log f(x,6) = X log e - 6 - log x! 

d^log f/ae^ = - x/e^, 

= -E(-x/e^) = 1/e. 

Thus Theorem (A) tells us that Vn(x-e) Is asymptotically distributed according to 
N( 0 ,e). 
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Example 4 : In this exanqple we llluetrate the method of maximum likelihood for 
dbtalnlng estimates when more than one parameter la present In the population distri- 
bution. Suppose Ojj Is from N(a,e). Then 

- in - i[n(x-a)^+S]/e 
P(0^;a,e) - (2tt 6) e , 


where 


3 - ^ (x.-x)^. 
1=1 ^ 


To find the estimates A, we set 

aP/da = dP/de = 0 

and solve for a and e: 

log P = - in log (?Tr) - in log e - i(n(x-a )^+S ]/6, 

dP/da =» P[n(3c-a)/e] = o, 

dP/de = ip|-n/6 -f [n(x-*a )^+SV^^ I ==• o. 

The solutions of these equations are easily foiind to be 

a = X, e = 3/n. 

As we have previously noted, these are both consistent estimates, but the latter Is 
biased. 

Let us compute the asymptotic variance- covariance matrix of V7i(a-a) and Vn(e-e) 
as given In the generalization stated below Theorem (A): 

log r = - i log e - i(x-a)"/e - i log (2Tr), 


d^log f 1 d^log f 1 


a'^iog f 


^ dade 


Hence the asymptotic variance- covariance matrix Is 

V i ° 

^ea ^ee ^ 77? 


e 0 
0 ?e^ 


It Is easily verified that the entries In the last matrix are exact with the excep- 
tion of the (?,?) entry whose exact value Is ?6"(n-i)/n, 

6.3 Tolerance Interval Estimation 

In the foregoing sections we have discussed two methods of estimating one or 
more parameters In distribution functions from samples: the method of confidence Inter- 
vals and the method of point estimation based on the method of maxlmimi likelihood. If 
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th 0 original parameters, say 6^, e^, are transformed to new parameters , <(> 2 ^ 

•••/ ^ by any one-to-one transformation i>^ « ^ which is 

continuous and possesses first derivatives, we may apply both methods of estimation as be- 
fore to the problem of estimating the new parameters. In fact, it can be readily verified 
that the maximum likelihood estimates of the 4)^ are 4)^(6^, 6^,..., e^), 1 « i,2,...,h, 
where 6^, e^,..., are maximum likelihood estimates of the A specific case of 

transforming a single parameter was discussed In §6.24; the problem there was to find a 
function of 6 having a maximum likelihood estimate whose variance In large samples (to 
terms of order ~) does not depend on this function of e. 

Another problem of estimating a function of the parameter which deserves special 
comment Is that of setting tolerance limits (see §4.55). This problem Is as follows: 
Suppose f(x,6)dx Is the probability element of x where 6 Is the jDarameter. For a given 
0 < p’ < 1 let and be such that 



-00 


Lt and are continuous functions of p* and 6 ; denote them by L'|(e,p% Lo(9,h’). Prom the 
discussion 3n the paragraph above It follows that In a sample of size n the likelihood es- 
timates of L^( 6 ,n') and of L 2 (a,P') are L^(§,n’) and L^(^,f5'), which are completely expressible 
In terms of the sample, when the functional form of f(x, 6 ) Is given. Now the Integral 

(^) ^ \ f(x, 6 )dx 

LT(6,n') 

Is a random variable which represents the proportion of the population for which L^(e,n*) 

^ ^ ^ ^^(e,/!’ ). Assume tliat the distribution function of the Integral (a) Is independent 
of e. If 6 converges stochastically to e as n oo (which Is Implied by the assumption 
(Iv), Theorem (A), §6.24) then L^( 6 ,P’) and Lp(e,p’) converges stochastically to L^(e,f 3 ') 
and L 2 ( 6 , p* ), respectively, and hence the integral (a) converges stochastically to p’. 
Therefore, for a given ft on the Interval (o,l), and € on the same interval and choosing 
some ft' on the Interval (p , i ), one can choose an n, say n’, such that for n / n* 

Pr(v^ft) > €, 

no matter what value e may have. For some values of p and €, particularly those near 
unity (e. g., .95 or . 99 ) there exists a smallest n, say n^^, such that 
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(c) 


Pr(v^^) ^ €. 


Therefore, under this condition and are loop % parameter- free tolerance 

limits at probability level € (see §^.55)*. The Interval L^(6,pM> L^C^^PM on the x-axls 
may be referred to as a tolerance Interval based on samples of size n^ for coverlnja; oi 
estimating; at least loop % of the values of x of the population, with a probability of 
at least €, These results may be extended to the case In which two or more parameters are 
Involved In the distribution function of x. 

It Is evident that there are many ways of choosing tolerance limits as functions 
of 6 so that statement (b) can be made, e. g., and could be determined by cutting 
off unequal probabilities from the tall of the distribution function f(x,e) rather than 
equal probabilities. 

The reader should note carefully the distinction between a confidence interval 
statement (§6.11 ) about a population parameter and a tolerance Interval statement (In this 
§ and §4.55) about a population proportion. It will be seen, however, In the last ex- 
ample of §6.12 that the confidence statement about the proportion p In a binomial popula- 
tion Is closely analogous to a tolerance Interval statement about a population proportion 
In the case of a population with a continuous random variable. 

As an example of tolerance limits of this type Involving two parameters con- 
sider a sample 0^(x^ , x^, ..., x^^) drawn from the normal distribution N(a,Gr^). Lot x be 

• the sample mean and s^ = X (x.-x)^. Let t , be such that 

n-1 1 ' p* 



where gj^_^(t) Is the "Student” distribution with n-1 degrees of freedom (see §5-3 ). Let 
L^ « X - tp, L^ = X + t^ , 3. The proportion of the population having 

values of x on the tolerance Interval x + t^ , 3 is 


(e) 


N(a,(T‘' 


)dx. 


L 


1 


The distribution function of this integral la not known. However, It has been shown' 


♦For details of the approach to tolerance limits for large samples when the functional 
form of f(x,e) is known, see A. Wald, "Setting of Tolerance Limits when the Sample Is 
Large”, Annals of Math . Stab., Vol. 15 (1942). 

♦♦S. S. Wilks, "Determination of Sample Size for Setting Tolerance Limits", Annals of 
Math . Stat . , Vol. 12, (1941), pp. 94 - 95 . 
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that the mean value of this integral ia ft. Its variance has been determined only for 
large aamplea, which is t^,e ^}(rtn) to terms of order 

In the discussion thus far, it has been assumed that the functional form of the 
population distribution f(x,6) is known but the value of e is unknown. From the point of 
view of practical statistics the case in which x is a continuous random variable with an 


unknown distribution is perhaps more important than the case in which the functional form 
is known. This case has been treated in §4.55*. 


6.4 The Fitting of Distribution Fimctions 

The problem of fitting of distribution functionals as follows: Let 
F(x,e^ ,6^, . . . be a c. d. f. depending on the h parameters 6^, and let 0^^ 

be a sample of size n from a population having this c. d. f. Consider the values 

of the sample 0^^ as n values of a variable x. From these n values we can 

construct an "empirical” c. d. f say F^(x). The problem of fitting F(x,6^ , . • . ,6^^) to 
Fj^^(x) is that of determining so that F(x,e^ , . . . ) is approximately equal 

to tri some sense. 

The method of maxlim.im likelihood provides one method of determining values of 
, 6^,..., 6^^ by maximizing the likelihood Af{ Xi,6^ ,6^, . . . ,6^^ ) with respect to the e's. 
Clearly the values assigned to the parameters by this method are precisely their maximum 
likelihood estimates 6^, 6^,..., (§6.24). This method of fitting Is best in the sense 

that for large n, the variance gf each 6^ is less than or equal to that of any other con- 

sistent and asymptotically normally distributed estimate of 6^. 

Another method of fitting which is easy to apply in many problems is the method 
of moments . This method consists of equating the moments 


M- *= (1 = 1,2,...,h) 

and solving for 6^, 6^ (assuming exists for 1 = i ,2, . . . ,h), where 


*Por further details, not given here, the reader is referred to S. S. Wilks, loc. clt.,and 
also S. S. Wilks, "Statistical Prediction with Special Reference to the Problem of Toler- 
ance Limits", Annals of Math . Stat . , Vol. 15 (1942). An extension of the notion of tol- 
erance limits to two or more variables is to be presented in a forthcoming paper in the 
Annals of Math . Stat . by A. Wald. 
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MJl 


OD 

^ X dF(x, 6^ , 6^ , . . • , 6^ ) , 

-00 

QD 

M| = ( x^dP^(x) = 

] «=’ 

-00 

In the ca&e of fitting certain diatrlbutlona, for example the normal distribu- 
tion, the binomial and Poisson distributions, the two methods of fitting yield the same 
results. 



CHAPTER VII 


TESTS OF STATISTICAL HYPOTHESES 

Suppose the distribution function of a population depends on parameters 6^, 

We assume the functional form of the distribution to be knovm, but not the true 
pareimeter values. Let /ibe the h-dlmenslonal space of admissible parameter values. De- 
note the parameter point by0. Let ou be a specified point set Inf): It may be of dlraen- 
alonallty o, 1 , . . . , up to h. In this chapter we consider tests of the statistical hypoth - 
esis « 


HQ:e€a>. 


A test of H^ Is a procedure for accepting or rejecting H^ on the evidence afforded by a 
sample from the population, A more precise definition of a test will be given In §7.3. 

As a general rule one sets up a test with the hope of rejecting the hypothesis, and for 
this reason the hypothesis Is often called a null hypothesis In such cases. Thus, If one 
desires confirmation of a suspicion that two populations have different means, one takes 


as the h; 



tsls that the means are equal, and If H^ Is rejected by the test, then 


one's susprcjPHlp.13 confirmed on the basis of the test used. 

Statistical hypotheses are classified as follows: If a* Is a single point of H, 
that Is, If Hq states © « 9^, then H^ Is called simple ; In any other case H^ Is called 
composite . 

7.1 Statistical Tests Related to Confidence Intervals 

Consider the case where H^ specifies the value of only one parameter 6^, 


Ho-- 




If the population distribution depends on no other parameters, this Is a simple hypothesis 
If other parameters 6^ are present, H^ Is composite, u> being the h-i dimensional 
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aubapace (hyperplane) in O defined by 6 ^ = 6 ^. If confidence Intervala <5(0^^) for 6 ^ are 
available, then one may proceed aa follows: Form <5(0^^) for the sample 0^^, and reject 
unless <5(0^) covers e?. If € la the confidence coefficient, then 

Pr(rejectlng if it is true) = 1 - Pr |e^€(5(0j^) 1 6 ^ = e^! = i - €. 


The quantity a « i - € la called the signif icance level of the teat. It will be noted 

that when confidence Intervals for 6 ^ are known, then a whole family of tests is at hand: 

A test exists for every 6 °, that la, for every admissible value of 6 ^ . We remark that be- 
yond the statement Pr( rejecting if true) = a , no further property of the test can be 
deduced from the definition of confidence Intervals. One might ask about the Pr(accepting 
if false), that is, accepting when 6 ^ has some other value than 6 ^, but the signif- 
icance level tells us nothing about this*. As will be seen in the examples below, our 
method usually leads us to the calculation of a certain statistic, say T, and is re- 
jected if T falls in a certain range R. Suppose, for example, that R is the range T > T^, 

and that T poaaeaaea the p. d. f. f(T) if e, = In certain caaea it la sometlmea aaid 
that or = Pr( finding a value of T less probable than if is true). This really does 
not motivate the test any better: If by is less probable than we mean f (T^ ) < 

f(T 2 ), then the same test can be made with other statistics 5= <t)(T), and the relation 
"less probable" is not invariant under such transformations**. 

It should be noted that confidence intervals give us a far more complete judge- 
ment about the parameter 6 ^ than slgnlficantt tests. We also remark that if confidence 
regions (§6.i4) for the set 6 ^, e^,..., are available, then so are significance tests 
for the hypothesis 




= e. 




Hq l3 simple if m => h, composite if m < h. 

Example 1 : Suppose that on the basis of the sample 0^ from a population with 
the distribution N(a,cr ), where a and are unknown, we wish to test the (Student) 
hypothesis 

Hq: a = a^. 

This is a composite hypothesis: The space fl of admissible parameter points {a,<j-^) I 3 


♦See §7.3. 

♦•This may be shown by considering the signs of f'(T) and g'(S), where g(3) Is the p. d. 
f. of S. 
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(b) 


- - -i(n(x-a )^+3]/e 

Pq = P(On;ao,e) = ( 2 -rte) ^ ^ 2 o 


log Pq = ■ i 1 ^ log ( 2 tT) - log e - i[n(x-ap)^+S]/e, 
aPo/^« “ Pq*' i"/® i[n(x-a^)^+3]/e2|. 

Equating this to zero and solving for e, we get 

e = (x-a^)^+S/n, 

and substituting this into (b), we find 

> in - In 

Pj^(Ojj) = | 2 it[(x-aQ)^+S/n]| ^ e 2 . 

Hence 

2 ■ 2^ 

A = [l+n(x-aQ)‘^/S] . 

The distribution of A under the assumption that is true is Independent of the un- 
known 6, in fact 

- -n 

A = [i+t^/(n-i )] ^ , 


where 


t = ^ln(x-&^)/3 

P p 

has the t-dlstributlon gn-i^"^^ with n-l degrees of freedom. Let t « t^ correspond 
to A = A^, Then A ^ if and only if |t| > t^. To get 

Pr( A ^ A^) = a. 


we define t^ from 

00 

2 I gn-i(b)dt - a. 

The likelihood ratio test for is seen to be the same as the (Student) test of 
Example 1 , §7.1 . 

In many cases the asymptotic distribution of the likelihood ratio is given by 
Theorem (A) : Suppose the c. d. f • o^ the population depends on parameters 6^ , 
e ^, . and that A the likelihood ratio for the hypothesis 


ra 




In many problems (for instance, all the examples we have considered in § 7 . 1 ) several 
tests, or a whole family of tests, are available, and the question arises, which Is the 
**be3t" test? For the comparison of tests, Neyman and Pearson have Introduced the concept 
of the power of a test. We approach this concept through the following steps: 

First; we note that any test consists of the choice of a (B-raeas.) region w In 
the sample space and the rule that we reject if and only If the sample point 0^ falls 

In w. w Is called the critical region of the test. The power of the test Is defined to 
be the probability that we reject This Is a function of the critical region w (a set 
function of w) and of the parameter point ® (a point function of 0). We write It 
P(wlQ) and note It Is 

P(w|®) = Pr(Oj^€w|0). 

The Interpretation of the power function is based on the following observation: In using 
a test of two types of error are possible (exhaustive and mutually exclusive): (I) 

We may reject when It Is true. (II) We may accept when it Is false, 1. e., when 

0 Is a point not In w. We call these respectively Type I and Type II errors. Now a 


♦The regularity conditions are the same as those for the multi- parameter analogue of 
Theorem (A), §6.24. 





Type I error can only occur if the true ®€a>. Hence the probability of making a Type I 
error If 0 Cco Is 

(a) Pr(0^€wie€Go) = P(wl0) for 0€cx>. 

A Type II error can be committed only If 0^co. The probability of making a Type II error 
If 0ffco Is 

Pr(O^jfw|0M = l-Pr(O^€w|0jfco) 

= i-P(wle) forQioj. 

The significance of the power of a test Is now seen to be the following: For 0€co, P(w|0) 
la the probability of committing a Type I error; forO^co, B(wl0) Is the probability of 
avoiding a Type II error. We illustrate this discussion with an example of a one para- 
meter case*. 

Suppose 0^ is from N(a,l ), and that we wish to test the hypothesis 

H_: a « a^. 

0 0 

Let u^ , u^ be any two numbers, -oo ^ u^ ^ ^2 ^ ^ such that 

- 1 ^2 - lu^ 

(b) (2TT) ^ ^ e ^ du =■ 1 - a. 

Consider the test which consists of rejecting If 

(c) Vn(x-aQ) < u, or Vn(x-a^) > u^. 

The critical region w of the test la the part of the sample space defined by (c), that Is, 
the region outside a certain pair of parallel hyperplanes (If u^ « -oo , or « +oo , w la 
a half- apace). Let us calculate the power of the teat: 

P(w|0‘) = Pr[Vn(x-a^) < u^ or Vrr(x-a^) > u^|a] 

= i-Pr[u^ ^ Vn(x-a^) ^ u^ta]. 

Now If the true parameter value la a, 

u = Vn(x-a) 

l*An elementary discussion of a simple case with several parameters may be found in a 
1 paper by H. Scheffe, "On the ratio of the variances of two normal populations". Annals 
i> of Mathematical Statistics, Vol. 13 ( 19 ^ 2 ), No. 




haa the distribution N(0,i). Write 


Then 


(d) 



Each choice of the pair of limits u^, satisfying (b) gives a test of Let us now 
consider the class C of tests thus determined, and try to find which is the "best" test of 
the class C. 

We note first that for all teats of the class, 


Pr(Type I error) •• Plwla^) ■ a, 

from (d) and (b). This la what we have previously called the significance level of the 
teat. To compare the tests we ml^t consider the graphs of P(w|a) against a for the var- 
ious tests; the graph for a given test Is called the power curve of the test. We have 
seen that for every teat of the class C, the power curve passes through the point (aQ,o). 
To find the shape of the power curve, we might plot points from (d), but by elementary 
methods we reach the following conclusions; The slope of the power curve corresponding 
to (u,, Ug) is zero if and only if a la equal to 

(e) ■» Sq + -i(u^+U2)/Vn. 

As a — • +00, P(w|a) — > l, unless Ug = +oo , in which case P — ♦ o. As a — + -oo, again 
P — * 1 , unless = -oo , in which case P — + 0 . Also, 0 < P < 1 • Except for the cases 
u^ » -00 or Ug = +00 , the power curve must then behave as follows: rising from a minimum 
at the value a “ given by (e), it increases monotonlcally, approaching the asymptote 
P " 1 as a — ♦ 00 . It may be shown from (d) and (e) that its behavior is symmetrical 
with respect to the line a » a^^j. In the exceptional cases, for u^ ■ -oo , P Increases mon- 
otonlcally from 0 to 1 as a increases from -oo to +oo; for Ug ■ +oo, P decreases monoton- 
lcally from 1 to 0 . oome power curves are sketched in the figure: 
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to the left of a^, (v) to the rl^t. All the tests of class C have power curves lying In 
the region between the curves (1) and (11). 

As far as the probability of avoiding errors of Type I is concerned, the tests 
are equivalent, since the curves all pass through (a^,a). For a ^ a^, we recall that the 
ordinate on the curve is the probability of avoiding a Type II error. For two tests of 
Hq, say T.| and T^, with critical regions w^ and Wp, we say that T^ Is more powerful than 
Tg for testing a » a^ against an alternative a = a* If P(w^ la*) > P(Wg|a*). 

This means that If the true parameter value Is a*, the probability of avoiding a Type II 
error la greater in using T^ than Tp. Now for altervatlves a > a^, the power curve (1) 
lies above all other power curves of tests of class C, that is, the test obtained by taking 
u^ - -00 Is the most powerful of the class C for all alternatives a )> a^. Hence this 
would be the best test of the class to use In a situation where we do not mind accepting 
if the true a < a^, but want the most sensitive test of the class for rejecting 
when the true a > a^. On the other hand, wc see that this test is the worst of the lot, 
that Is, the least powerful, for testing against alternatives a < a^. For these alter- 
natives the test with power curve (11), obtained by taking u^ = +oo. Is the most powerful. 
There is thus no test which Is uniformly most powerful of the class C for all alternatives 
-00 < a < + 00 1 

The situation described in the last sentence Is the common one. To deal with it 
Neyman and Pearson defined an unbiased test as one for which P(w|a) Is minimi.im for a 
The argument against biased tests in a situation where we are Interested in testing a 
hypothesis against all possible alternatives is that for a biased test Pr(acceptlng H^) 
is greater If a has certain values ^ a^ than If a = a^. If we set a^ = in (e) we find 
that the unbiased test of the class C is that for which = -Up. This Is the test of the 
class C we should prefer, barring the "one-sided" situations where the tests with power 
curves (1) and (11) are. appropriate . 
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This serves to Illustrate the comparison of tests by use of their power func- 
tions. Beyond this description of the underlying ideas of the Neyman- Pearson theory, it 
is not feasible to go into it further except for a few remarks: If one considers Instead 
of the class C, the more inclusive class of all teats with critical regions w for which 
P(w|aQ) - «, there Is again no uniformly moat powerful teat. However, the unbiased teat 
obtained above la actually the uniformly moat powerful unblaaed teat of this broader 
class. 

Leaving the one parameter case now, we recall that the definition of the power 
of a test and Its meaning In terms of the probability of committing Type I and Type II 
errors was given for the raultlparameter case at the beginning of this section. Methods 
of finding optimum critical regions In the ll^t of these concepts have been given by 
Neyman, Pearson, Wald and others, but there is still much work to be done. The problems 
of defining and finding "best" confidence intervals are related to those of "best" tests; 
the groundwork for such a theory has been laid by Neyman*. In conclusion, we recall the 
assumption made at the beginning of Chapter VII: that the functional form of the distri- 
bution is known for every possible parameter point: It Is clear that In the application 
of the theory the calculations for the gain In efficiency by using a "best" test In pref- 
erence to some other test will be invalidated if those calculations have been made for a 
distribution other than the true distribution. The whole theory Introduced above pre- 
sumes the knowledge of the functional fom of the distribution. 


•J. Neyman, "Outline of a theory of statistical estimation based on the classical theory 
of probability", Phil ; Trans . Roy . Soc . London . Series A, Vol. sje (1937), pp. 533-380, 
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NORMAL REGRESSION THEORY 


In §2.9 certain Ideas and definitions In regression theory were set forth and 
discussed. In the present chapter we shall consider sampling problems and testa of sta- 
tistical hypotheses which arise in an important special type of regression theory ^ich 
we shall refer to as npmml regression theory . To be more specific, we shall assimie that 
y is a random variable distributed according to N(^ a_x_,or^), where are fixed 

variates , and consider samples of size n from such a distribution. N(^ ^ 

conditional probability law of the form f(y|x^,X 2 ,...,Xj^). A sample of size n will con- 
sist of n sets of values ^2a' • * “ i,2,...,n, where y^,...,yj^ are n ran- 
dom variables, but where the Xp^, p - l,2,...,k, a* are fixed variates . and not 

random variables. We shall consider such problems as estimating (by confidence intervals 
and point estimation according to principles set forth in §6.1 and §6.2) values of the 
a*s and <t^ from the sample and of testing certain statistical hypotheses regarding the 
a*s. We shall also consider applications of normal regression theory to certain problems 
in analysis of variance, including row-column and Latin square lay-outs. 

8.1 Case of One Fixed Variate 

In order to fix our ideas in the regression problem, we shall first consider in 
detail the case in which y is distributed according to N(a+bx,o^). Let 0^^: 
a - l,2,...,n >lobe a sample of size n from a population having this distribution. The 
probability element for the sample is 


(a) dF(y^,..,y^) - )dj^ 


a»1 


- [(-p^)''e 
v2Tt<r 


~ ^ (ya-a-bXoi)^ 


20 ^ 


a-1 


]dy^ • . • 



i5i 
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Maximizing the likelihood function (that enoloaed In [ ]) with reapeot to <r^, a, h, we 
find In accordance with $6.24 that a «md S are given by aolvlng 


(b) 




and 3-^ la given by 


(c) 




Solving (b) we obtain 


(d) 


a - y • bx. 






In order to be able to solve (b), we must have^ (,x^-xy^ 0. Now a and S are linear func- 
tions of yi,...,yj^, and lt» follows from Theorem (C) of $3.23 that a eoid B are jointly dis- 
tributed according to a nozmal bivariate law with 


E(b) - b, 


E(a) - a. 


er| - 


(Xa-X) 


= x2’ 


<'!■ 


^4 




cov 


(a,B) - - 


x<^ 


(Xci-X)* 


where 


The sum of sqmrea In the exponent of (a) may be written as 

(ya-fi-bx^)^ 

Qg [n(S-a)^+2^ x„(a-a)(b-b)-t-^ xi(t)-b)^] . 
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and hence *> ^‘pqCpCg' cannot vanlah. Now the a^p and hence the 
^ are linear functlona of the random variables 7 ^, 7q* Therefore, the ap are 

distributed according to a normal k- variate distribution. The variance of Sp Is 

■ a*’^. Similarly, the covariance of Sp and la a^*^. 
It will ito noted that (b) can be written as 


'qjya (q- 1,2,...,k) 


which shows that (&p“ap)7 (P " i ,2, . . .,k)^are homogeneous linear functions of - 

p-1 - -- V ■ ^iVpa» 


functions of 




(a - i,2,...,n) are also homogeneous linear 




Now 


- ’I 


+ (l2> 


where 


(f) 


“0-2 


a-1 p- 




I2 ■ VV^p’^V^q’' 




are 


Hence, and are homogeneous quadratic forms In (y^- ^^a^Xp^), Since Op - ap 
distributed according to a k- variate normal law with variance- covariance matrix 
I I 1 1 ’ Wt follows from 55.22 that q^ is distributed according to they^'-lam with 
k degrees of freedom. Similarly, we know that 

la distributed according to the n degrees of freedom. Therefore, by 

Cochran's Theorem, §5.24, la distributed according to the with n - k degrees of 

freedom and independently of q^ (1. e., the ^ 

Consider the sum of squares In (c); we may write 


" (r2^®00 - ^^Vop * p!^,Vq®Pq^* 




- 
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But, It follows from $2. 94 that this expression reduces to 




®00 

®01 • 

• • ®ok 

(g) 

1 





®10 

• 

• 

®11 • 

• • 

• 

• • ®ik 



^0 

^1 ■ 

• • 

• • ®kk 


We may summarize In 


Let Oj^: (““ 1.2 n).be a sample of 

m a poimlatlon with distribution N(^ where x, , x, x^ , 


®Pq “ ^*Pa*qa' “"oP " ^/o^P® 




to a k-varlable 
- 1 


(3) The 


as In (g) M the ratio of two 


size n from a population with distribution N(XapXp,cr ), where x^^, x^ Xj^.^, 

(«- 1 , 2 .....n),are linearly Independent , Lpr'a^ - ^x x , a - ^ y^x . and 

ago - ^y*. ThMi 

( 1 ) The maximum Uasllhood estimates of the ap and o-^ are given by (e ) and (^) . 

( 2 ) The quantities (Sp - a^), (p - 1.2 k). are distributed according to a k-varlable 

normal law with zero meima and variance- covariance matrix 1 1 1 1 " \ 

(3) The quantity (y*-^ which may be expressed as In (g) as the ratio of twc 

P P“ 

determinants. Is distributed according to a x^-law with n - k degrees of freedom , and 
Independently of (Sp - Bp). 

Making use of the results as stated in Theorem (A), one may set up confidence 
limits (or a significance test) for any ap by setting up the appropriate Student ratio. 

Or one may set up confidence limits for <r^ by using . Confidence regions may be set up 
for all of the ap or any sub- set of them by setting up a Snedecor P ratio. In which the 
numerator sum of squares Is the exponent In the normal distribution of the corresponding 
set of Sp and the denominator sum of squares is q^ . 

An Alternative Proof of the Independence of and Qg. 

The proof which has been given for establishing the independence of the two 
above expressions In the probability sense depends upon Cochran’s Theorem . The Indepen- 
dence can also be established by the use of moment generating functions. Let ((>( 6 ^, 62 ) 
be the moment generating function defined as 

e,q,+e_q. 

*( 0 ^, 62 ) - E(e ’ ’ ^ 

where q^ and Qg are defined In (f). If we can show that 


- (i-2ei) ® (i-seg) 



then it follows by Theorem (B) In 52.81 and (e) of 53.5 that q, and are Independently 
distributed according to 7 ^f-laws with n - k and k degrees of freedom respectively. 


Let ^(yrt-5’ “ ^a* write equations (b) as 

P«1 ^ ^ 




(Q** ) 


which reduces to 


from which we have 


oq" 

-a ) - ;^aP'5a' . 
^ P P oq‘ 






: f - > z® - >^a' a' 

£] « ^1 op“ oq 




where 


" p^l'^’VqP 


For Qg we have 




ap-ap)(aq-aq) 


Z>> A z . 
a^h^^ aP ® P 


The probability element associated with the sample Is given by (a). Maklpg the 
transformation ^(ya- - z^, a - i,2,...,n, we obtain as the probability element of 


the z^ the expression 


-li: 


®dz,dZ2...dZj^. 


...5® ^dz,dz2...dz^. 


-00 -OD 


For the m. g. f . we have 


where 


Q - - 2«iq, - 

I 


where - 1 - 26, + 2 (e,- 6 g)A^ 
and B^p - . 2 (e,- 62 )A^, a ^ f3 . 

The value of the n- tuple Integral la ^ idiere B la the determinant IB „| . To evaluate 


follows 

(letting 1 

1 

CM 

1 

M, 2(e,-e2) - N): 

M+NA^ ^ 

NA^ 2 • • • 


*11 * 2 i*“*ki 

CM 

M4‘NA2 2 • • • 

NAan 

*12 * 22" ‘*102 


®- “^1 ^ *ln ^2n*--^kn 


.0 1 

. 0 


Suppoae the (n+p)-th column la multiplied by - N(^ a^^^x ) - C , aay, p - and 

q *"1 " ^ 

added to the a-th column a - 1 , 2 , We obtain the following expression for B 


M 

0 . 


*11 

*21- 

"•*ki 

0 

M . 
• • 


*12 

^22* 

• ' '*k2 


• 

• 

• 

• 



• 

• 

• 

• 

* 

0 

6.. 

M 

*1n 

*2n‘ 

•••*kn 

C. , 

C- ^ . 

c. 

1 

0 • . • 

. . .0 

^1 1 

12 

^in 






n 

0 

1 ... 

. . • 0 

721 

722 : 

72n 

• 

• 

• 


• 

• • 

• 

• 

• • 


• 

• • 

• 

• 

• 


• 

• # 

• 

• 



*^2 ^ ° 


Now auppoae the a-th column (a- i,2,...,n) be multiplied by and added to the (n+pj-th 

column (p « l,2,...,k). Noting that 




qta^^pci 


I ^ 


b p^q 


■pq' )1 ]>.q < 
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Curvilinear regression la also a special case. For example, for quadratic re- 
gression In two variables, say u and v, we would let -I, -u, x^ « v, x^^ ■ u^. 


x^ - v^, x^ « uv. 

8.3 A General Normal Regression Significance Teat 

The following general significance test frequently arises In normal regression 
theory: A sample 0^: “ i,2,...,n,la assumed to be drawn from a 

p 

population with distribution N(^ a^x ,<r ), and It Is desired to test the hypothesis 

pal P P 

that (r<k) have specified values, say ajj^o,re8pec- 

p 

tlvely, no matter what values a^ , a^, ...,ap and cr may have. For example, all specified 
values may be zero In which case the problem is to test the hypothesis that y, which la 
assumed to be distributed according to N( ^ apXp.cr^). Is actually Independent of , 

^r+2^ •••> 

In order to determine the test function (of the y^ and for testing the 
hypothesis we shall make use of the method of likelihood ratios discussed In 57.2. 

The probability element of the sample Is 


dP(y,...-yn) = P(0j^;a,,a2,...,aj^,o^)dy,...dy^, 


where 

(b) 


P(0n,a,,a2,..,a^ 




rsrC «r 



tsthe likelihood function. 

Let XI be ‘the (k+i )* dimensional parameter space for which cr^ > 0, -od < a^ < +oo , 
p = l,2,...,k, and let cn be the (k-r)- dimensional subspace of XI for which a^^^ » a^,^^ 
a^^g «* \ 0* denotes the hypothesis to be tested, then Ho Is the 

hypothesis that the true larameter point lies In ou, where the admissible points are those 
In n. 

The likelihood ratio A for testing Ho is given by 

max P(0 

(c) A- — = — ii — ! g - , 

ina^,^P( Oj^;a,j , . . . ,aj^,cr ) 

where the denominator Is the maximum of P(0j^;a^ , . . .,a^,o'^) for variations of the parameters 
over n and the numerator Is the majilinum for variations of the parameters over co. To find 
the mELXlraum of the likelihood function (b) over fl, we follow the ordinary procediire of 
taking the first derivative of the likelihood function with respect to each parameter. 


J&lL 
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setting the derivatives eqiial to zero. We find that the maximizing values are 


(d) 


Sj - "A - i?, '=1 "I; W- 




P " as given In 56.2. Substituting these In the likelihood function we find 

n _ n 

Similarly, by maximizing the likelihood over cu, we set a^,^, - a^,^, q, . . . ,aj^ 

- a^ Q, and differentiate with respect to o^, a^,...,ap, obtaining- ' 


(e) 


« ^luv* ^ -r i2 


where y^ - y^-^ag,oV^ ®ou “ \o “ and I |a’'''| I 

" lla^yll”\ U,v - 1,2,...,r. Substituting In the likelihood function, we find 

n _ n 

max^P(On;a^,...-,ajj.,o'^) - ( — e 

2tTfftu 


Therefore 

(f) 




A - 


% % 'h , ^ i 2 

Now it Is clear that o-q ^ o^, since Is the minimum of — ^ (y^-> ,a^^ng) variations 

^assl ^ ^ 

of while cr^ la the minlTman for variations of a^,...,ap, for fixed values of 


n<K 


Qi - 


n 


and q. 






The difference n(<r^^-a^) la simply the further reduction in the sum of squares 

) oht a ina hle hy vax*yin^ a^^ ^ y • • • in addition to a^ ^a 2 y • • • ^a^* 

avl J>al P ^ 

Expressed in terms of and q^. 


(g) 


n n 

A - (A-)= - (-^)^ 




u'Ja/q^ 


Thus A la a single- valued function of qg/q^ » idilch means that qg/q^ la equivalent to A as 
a test function. The nearer the value of A to unity, the smaller the value of qg- as 
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compared with . To complete the problem of setting up a teat for testing we must 
now obtain the distribution of A, (or Qg/q^ ) under the assumption that the hypothesis 
Is true, 1. e., that 0^^ has been drawn from N( ^^apXp^e*^) for a^,^^ - a^^^ o'***' 

\ " ®k,o* 

We shall now show* that If Hq la true then q^ and q^ are distributed Indepen- 
dently according to with n - k and k - r degrees of freedom respectively. 

The probability element for the sample 0^ from the population having distribu- 
tion N( ^ apXp.q’^) la given by (a). Now as we have seen In §8.2, the sum of squares In 
the exponent can be written as 


• ^'Vq'V“p><V‘Q>- 


The second expression in (h) may be written as 


uT^ g,h-r+l 


where (u « l,2,...,r) are linear fimctlons of (a -a ), (g - r+i,...,k) €ind where 
I IV I - I la^l g,h - r+1 .,k» where a^ is the element In the g-th row and h-th 


colimm In 1 1 a^ 


p,q « l,2,...,k. See §3*23. 


To verify the statement that expression (1) Is equal to the second expression In 


(h), let us denote % “ % ^ ^ ^ niust then determine the 1, 


and the b^, so that 




that la, an Identity In the d's. 


Taking of both sides of this Identity we get 




8 “ 


,v - 1,2,...,r, 
g ■ r+l,...,k:^ 


and hence 


\,g - “ ' '^v' 


Taking of both sides of (^) we get 

g h 
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(m) 


®uv^ug^vh ^gh “ 


U,V^1 

Using (k) and (1) we find that (m) reduces to 


(n) 


^^^,®(r)®Vih®vg ■*■ ^gh “ ^gh ' 


or 


( 0 ) 


V “ ®gh ■ 


Referring to § 2.94 it w^l be seen that 


(P) 


^gh “T^ • 


hi 

e 

•®lr 

• 

# 

?ih 


• 

• 

. a_^ 
rr 

*rh 

'gi 

• a_„ 
gr 

^gh 


which la equal to the term in the g-th row and h-th column* in the inverse of | Is^ql I • 

Making use of the fact that the sum of the squares in the exponent of the like- 
lihood function in (a) la 


(q) 


n k - 

^(y^-5^apXp^) + expression (1), 


it Is now clear that by maximizing the likelihood function for variations of a^,...,aj^, 

<r^ In <i) , we find 

(D <5 - il(±(y„-|;vp.>" * g,^„‘’8h'V“g,o>'V»h,o)l . 

since the first expression In (1) vanishes when a^,,..,ap are varied so as to maximize the 
likelihood function. 

Remembering that when the likelihood function la maximized with respect to a^ , 
over flwe obtain as given in (d), we clearly have 


( 3 ) 


_ _L ^ 


n(<^-cr^) 

■>2 ■ V'V“6,o’<V“i>,«> ■ 


which Is a function of the a (g « r+i,...,k) which, as we have seen in §8.2, are distrl- 

o 





freedom . respectively. when Ij, true . 
(4) The likelihood criterion A for teatlr 


1§ g iven ts 


■1+Vq, 1+ 

where P Is Snedecor * s ratio which la distributed ac 




idien true . 


8,4 Remarks on the Generality of Theorem (A), §8 >3 

In order to emphasize the generality of "^Tieorem (A), § 8 . 3 , we shall dlscuse 
briefly sereral cases of particular Interest. 

6,4i Case 1 . It frequently happens that the following statistical hypothesis 
1b to be tested on basis of a sample 6^^: ‘ assumed to have been drawn 

from a population with distribution N(^apXp,or^ ): 


n: I -OD < ap < +00 , y 0 , 




cj: ! Region in fl f or which ^ ^up^p “ y 0, u « r+1 ,r+2, , • • ,k 

where the c^^ are linearly independent constants. In other words the hypothesis to be 

tested here is that there are k - r linear restrictions among the a^, given that the 

sanple Is from a population with distribution N(]^a x ,cr^). Denoting this statistical 

p«=»l ^ ^ 

hypothesis by H', we may verify from Theorem (A) that the likelihood criterion for testing 
Is of the same form as A for where is the minimum of S = for 

variations of a^ , . . ,ap over n, and cr^q^ la the difference between the minimum of S over u) 
and the mininnim of 3 over A- As In the case of q, and q^ for are independently 
distributed according to -j^f-laws with n - k and k - r degrees of freedom respectively, 
when la true. To verify that this la true, we transform the Sp as follows: 


% " ®’u 

U>“ 1y2y,e,/ r 


g - r+i , . . k, 

We may write this transfonnation as 


^i°qp®p ■" ' 

Q“ l,2 ,...,k. 
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where 

Cgp - 1 g - P i r 

- 0 othemflse . 

Without loss of generality we may aaeume the Ogp to be auch that ^ 0. Hence 


c®a« 


and ^ ^^^p* Therefore may be expreaaed aa the atatla- 


tlcal hyi)otheala: 


ri: -00 < a^ < CD , 


> 0 




CO : Region In II for which a^ » o, <r > o g =» r+l , . . . ,k, 

which la to be teated on baala of the aample 0^^: ^ i,2,...,n 

drawn from a population with distribution N( ^ a^x^,cr^). Theorem (A) la Immediately 
applicable to aa expreaaed In this form. 

8,42 Caae 2 . The following statistical hypothesis aay frequently arises, 
to be teated on baala of a sample 0^^ from N(^apXp,o^): 


Ib: -00 < ap < 00, 


CO : Region In A for which a 


^ ^ 0, p — 1,2, •••,!<; 


c y 0, p=» 1,2, 


In other words the hypothesis la that the ap caui each be expressed linearly In terms 
of r (Oc) parameters a*,,,,,a^ where the are given. By using the transformation 
a « ^c®a*, where the c®, (,q « r+l,...,k, p « l,2,...,k) are further given numbers 
auch that |c®| ^ 0 we can express aa follows: 


H: -00 < a^ < 00, 


> 0 , 


p - i,2,...,k 
g « r+1 , . . ,,k. 


to be teated on baala of the sample 0^^: (ya ^ 


ll •1^ I 

from a population with distribution a^x^,cr^). This case la clearly covered by 
Theorem (A). 

In this caae o^q, la the minimum of S - ^(ya-^SpXp^)® for variations of a,, 
while Is the difference between the minimum of S for variations of 

the ap over Hand that of S for variations of the ap overw(l.e., for unrestricted varia- 
tions of a^, u-i,s,...,r, when the ap are replaced by ^c'^a'g In 3 and the a^ are set 
eqxial to o (^r+i , . . . ,k). 




This case la clearly covered by 
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and are Independently distributed according to ;^^-lawa with n - k and k - r degrees 


of freedom, respectively, when is true. 


6,43 Case 3 , The following variant of the hypothesis of § 8.3 arises in 
such pTOblema as randomized blocks (see §9*2 ), Latin squares (see §9.^ ), etc,, to be 
tested on basis of a sample 0^^: i,2,.,.,n. Denoting this hypothesis 

by Hq, it is specified as follows: 

f "Oo ^ ap ^ -KD , ^ y 0 , P “ l^2,,,,,k, 

with the fiLp restricted by the r^ linearly inde- 
pendent conditions. 

<k. 


The subspace in il for which 




V - I,2,...,rg, where r^ < Tg < k. 


is the hypothesis that the ap satisfy r^ - r^ further linear restrictions, asaiomlng 
that r^ linear restrictions are fulfilled, linear Independence being assumed throu^out. 

la to be tested on basis of a sample 0 : (y^^ . . . , a «• i,2,...,n. In this 

case cr^q^ is the minimum of S « ±l5i- f. SpX for variations of the a overiX (i. e, 


case cr q- is the minimum of S « >> variations of the a overiX (i. e. 
for variations of the subject to the restrictions > l^2,...,r^) while o- q^ 
is the difference between this minimum and that for variations of the ap overa>(l. e. for 


variations of the ap subject to the restriction 




V e 1 , 2, . . . , r^ ) . When is 


true, q^ and q^ are Independently distributed according to \ -laws with n - k - r^ and 

Tg - r^ degrees of freedom respectively. 

That this case is 'covered by Theorem (A) may be seen by considering the non- 

singular transformation of the a. q » 1 , 2 , ...,k where dL are given numbers 

^ ^ ^ k k 

since that l<ipql ^ O. We have ap «■ which transfoms Into 

where Now under n. the regression function Is aJx ' , and we may 

P® pSl P® P-r.+l P P 

TTfM ' 


therefore specify as 


-00 < a^ < 00, «r 2 > 0, p - r,+i,...,k (a',...,a^^ 
being assumed 0 from the outset) 

Subspace in A for idilch 


The applicability of Theorem (A) is now obvious 



in 


8.5 The Minimum of a 3\nn of Sauarea of Devlatlona with Respect to Regression 
Coefficients which are 3ub.1ect to T.inAnn R eatrlotlona . 

It will be noted In $$8.5 and 8.4 that frequently we have to find the mlnlnaim 






for variations of the ap, when the a^ are subject to one or more linear restrictions. 

The object of this section Is to give an explicit expression for the minimum of the sum of 
sqiiares, under such conditions, as a ratio of two determinants. 

Let us consider the problem of finding the minimum of the sum of squares (a), 
when the ap are subject to the linear restrictions 


c a 0. 

I up p 


(u“ 1,2,...,r^lc). 


We shall use the method of Lagrange, § 4.7 , and write 


F(a^ ,a2, • . • ,a^;A ^ ,A » s + 2 ^up^p ^ * 


It Is necessary that 


0 - aaq “ la^ **■ ^^^u^uq “ " l,2,...,k) 


In order for S to have an extremum (In this case a minimum). Performing the differentia- 


tion ^ , these equations may be written as 




C ■0. 

uq ^ 


(Q" ^>2,.. ,k)f 


where a^^, ap^ (and a^^) are defined In (d) of 58.2. Multiplying each of (e) by a^ and 
sumnlng from q - l to k, we get 




Expanding Q, we obtain 


3 - a^ 




and making use of (f ), 


|;>op*p - »oo - 3. 



Reirrltlog (h) and (e) with a^ - i, and using the conditions (b), we obtain the following 
homogeneous linear equations in the i k 4 r quantities a^, a^, a^, a^. 


6L + 


<3-‘oo>»o * ^V“p ■ 


In order for these equations to have a non-vanishing solution the determinant 
of the 1 + k + r equations must satisfy the Well-known condition of being 0 , 1. e., 


(3-%o) 

®01* 

• • ’^ok 

0 . 

... 0 

♦ 

®11 * 

• • 

...aik 

®n* 

• 

•••®rl 

• 



‘‘\k 

®ik* 

• 

••'°rk 

( 0 - 0 ) 

• 

• 

9n* 

• 

•••9ik 

• 

0 .. 
• ♦ 
e 

• • • • 0 
• 

• e 

( 0 - 0 ) 

°rr 

• 

0 . 

• • 

..*•0 


Treating the first colimin as a sum of 2 columns as Indicated and employing the 
usual rule for expressl^ the determinant as the sum of two determinants, we find the min- 
imum value of S to be given by 


(Ic) 

where 


^mln “ 2^ 


®oo ®oi**”®ok 


?io ?ii:***“ik “ 11 ••••Vn 

• • • • • 

• • • • • 

• • • • • 

^0 ®kl °kk ®ik*’*‘°ric 

9 9ll 9lk ?: 9 


0 0,...:.0 


and Is the minor of a^^ in a. 


It should be noted that the values of the Sp and which yield the extremum of 
P(or the values of the Sp which yield the minimum value of 3) are given by the last k + r 
linear equations In ( 1 ) with a^ > 1 . 



CHAPTER IX 


APPLICATIONS OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS 


In this chapter we shall consider some applications of normal regression theory 
together with the general significance teat embodied In Theorem (A), § 8 . 3 , to certain 
problems In the field of statistical analysis known as analysis of variance. This field 
of analysis Is due primarily to R. A. Plsher. 

9.1 Testing for the Equality of Means of Normal Populations with the Same 
Variance 

Suppose On (ypa)»«“ i»2,...,np, p» l,2,...,k, are samples from N(a^ ,or®), 

O P p I 

N(ag, o-^), N(aj^,o“=) respectively, and that it is desli*ed to test the statistical 

hypothesis HQ(a^-a 2 -. . .-aj^) specified as follows: ^ 

a : -00 < ap < 00 , cr^ > 0, p = 1,2, ...,k 

cu: ap-a, -oo<a<aD, o-^>0, p-»i,2,...,k. 


In other words H^ la the statistical hypothesis that all of the samples are drawn from 
normal populations with Identical means, given that the populations are normal and have 
equal variances. The probability element for the k samples Is 


(a) 


n 




■) e 


2<r p-1 




3T1 ff 

P-1 a-l 





Maximizing the likelihood function (1. e.,* the expression In ( ]) for variations of the 
parameters over H, we obtain 


(b) 






where yp - ^yp^, the mean of the y's In the p-th aainple. 




Maximizing the likelihood 
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(c) 

k 

where y » ^ 

(d) 


k 

iZ, 




y 


^•'pa n 

M “ ^ 


Now and of Theorem (A), 58.5# are aa follows; 


^2 " ^ “ 2 


Aaaumlng H(a^»=a 2 =. . .«a^) la true ( 1. e., that a^ - a^ 


- aj^; It follows from Theorem 


(A), §8.3# that and q^ are 'Independently distributed according to laws with n - k 

and k - 1 degrees of freedom respectively. Hence 

k 


(e) 


(n-k)q2 


is distributed according to n-k^^^ 

To see exactly how this problem is an application of Theorem (A)# the reader 


should refer to §8.4i, Case i. It will be noted that the set of k samples, 0 , 0. 


X 




0^^, can be regarded as a single sample of size n (n « from a population with dis- 

trlbutlon N(^a^x^, a^), where x, = i , ■=...= Xj^ = r'for Oj^ , Xg =1, x, - x, - ... - 


P-1 


= 0 for 0^ , and ao on. The hypotheala Hla^-ag-. . .-3^^,) la that all are equal, 1. e. 




where c 


ip 


1, aj 


a. a • 

CL, 


q=1 

9.2 Randomized Blocks or Two-way layouts 


0, q *■ 2,3#...#k. 


Suppose y^j ( i » i,2,...,r, j » i,2,.’,.,s) are random variables independently 

distributed according to N(m+R. +C-.,o^) where « 0, and that we wish to test 

^ J 1»1 ^ 3^1 J 

on basis of the the hypothesis H[(Cj) » 0] specified as follows: 


il: 


-00 < m, Rj^, < 00 , 0-2 > 0 , 1 - 1,2, ...,r; j - 1,2 ,. ..,3 

r s 

HRi - = 0 

1-1 ^ pi j 

u) : I The auhapace In fl obtained by aettlng each Cj »• 0. 

The C 4 J space is simply the subspace in fl f or which the Cj are all 0. The probability ele- 
ment for the sample (1. e. the y^j) is 

’ ra ^ ^ j 


(a) 


Y^cr 


iTldyj^ . . 


l,j 


The aum of -aquarea In the exponent of (a) may be written aa 








H[(R^)=o] defined as followe: 


SI: 1 Same aa foril in definition of H[(Cj)«0] 

oj: I The aubapace inn obtained by aetting each - 0. 

Following stepa similar to thoae followed for H[(Cj)“0] we find for H[(R^)«o], 

, ^ (3-oS(yiry)^ 

(c) F - ^ 

^(yij-yi.-y.j^y) 

which will be distributed according to h^^_, j (r-i )(s-i when H[(Rj^)” 0 ] is true. 

The applicability of Theorem (A), § 8 . 3 , In testing H[(C.)=0] Is evident when It 

J 

ia noted that under n the y. . can be regarded aa a sample of size ra from a population 

^ r±a±i 


having a distribution of the form N( 


rt a- t-1 


a ) in which there are two homogeneous lin- 


ear conditions on the ap (the ap being written in place of the m, R^, Cj eind each x having 
the value 0 or 1 ) whereas under co there would be a + i linear conditions on the ap (or 
a - 1 linear conditions in addition to thoae already imposed under fi). Both H[(C *)=‘0] 

J 

and H[(R^)=0] come under Case 3, §8,43, 

If the y^j, 1 -= l,2,..,,r; j « 1,2, ..., 3 , are considered In a rectangular array 

with 1 referring to rows and j to columns, then it will be seen that we are assuming that 

y^j ia a normally distributed random variable with a mean which is the sum of three parts: 

a general constant m, a specific constant R. associated with the 1 -th row and a specific 

constant associated with the j-th column ( where 21 R 4 = 2Ic. » 0 ). The variance is 
^ 1 - j 

assumed to be Independent of i and j. Statistically speaking. is often referred to aa 

effect ( or main effect ) due to the i-th row, and C . the effect ( or main effect ) due ^ Uie 

J 

.i-th column > H[(R^)=» 0 ] is therefore the hypothesis that row effects are zero no matter 

what the values of m and column effects. The quantity (y. r-y,-y 4 +y)^ is often re- 

lyj 

ferred to aa "error" or "residual" sum of sqiiares after row and column* effects are re- 
moved, and when divided by (r-i ) (s-l ) the resulting expression provides an imblased esti- 
mate of no matter what the values of m, the R. and C.. <Sy 4 -y)^ ia usually referred 

^ In) 

to aa sum of squares due to rows , and when divided by r - 1 , the resulting quotient pro- 
vides an unbiased estimate of (and,aa we have seen, Independent of that obtained by 


using "error" sum of squares) if the R^ « 0 , no matter what values the Cj and m may have. 
A similar statement holds for ^ j-y)^. It can be shown by Cochranes Theorem and by 

the use of moment generating functions, although we shall not do so here, that 
j: 5 ^(y^j-y^ry.j+y)^, ^^(y.j-y)^ are independently distributed according 


to \ -laws with (r-1 ) ( 3-1 


j " ^"173 

)f (r"0> (3“0 degrees 


of freedom respectively. If the R^^ and 
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Cj are all zero, and furthennore the stun of the three quantities Is -^^(y4 4-y)^, which 
l8 distributed according to the with ra-i degrees of freedom if each and each 
Cj Is zero. 

These various sums of squares together with their degrees of freedom are com- 
monly set forth In an analysis of variance table as follows ; 


Variation Due to 

Sum of Squares 

Degrees of Freedom 

Rows 


r - 1 

Columna 


3 - 1 

Error 

Se = 2I(yij-yi.-y.j+y)^ 
J 

(r-l ) (a-1 ) 

Total 

3 = 5I(yi^4-y)^ 

IJ 

rs - 1 


The main facts regarding the constituents of this Table may be summarized as follows: 

( 1 ) S ■» Sp + Sq + Sg. 

(2) Sp^ g, 2 independently distributed according to x^-laws with (r-l), 

(a-1)j (r-1 )(S“1 ) degrees of freedom respectively If all and C-» are zero. 

(3-1 )Sp ^ J 

(5) P “ — - is distributed according to h^^_^ j j(P)dP when Hi [(R^)»o] Is 

true. 

(r-l )Sp 

(4) P - — Is distributed according to h^g_,^ ^ )(3-1 when H[(Cj)«0] Is 

true. 

(5) la distributed according to the with (r-i)(a-i) degrees of freedom for 

any parameter point Infl (!• e. no matter what values the R^ and C . may have). 

(6) S/or^ Is distributed according to the ^.^-law with rs-l degrees of freedom If all R^^ 
and Cj are zero. 

The theory discussed In this section has been widely used In what are called 
designed experiments, particularly in agricultural science. For example, rows in our 
rectangular array may be associated with r different varieties of wheat, columns with s 
different types of fertilizer, and y^j with the yield of wheat on the plot of soil associ- 
ated with the 1-th variety and j-th fertilizer, it being assumed that plots are of the 
same size eind the soli homogeneous for all plots. In such an application, we emphasize 
that the fundamental assumptions are that the yield on the plot associated with the 1-th 



variety and the j-th fertilizer may be regarded as a normally distributed random variable 

having mean value of the form m -f R. + (where ^ ~ * 0), and a variance 

1 j .1 1 j- j 

which has the same value for all i and j. The question of whether the assumptions are 
tenable in any given case is one for the individual applying the method to settle. In 
this example H((C.)=“0] would be the hypothesis that fertilizer effects on yield are all 

J 

equal no matter what the variety effects may be. 

9.3 Three-way and Hip^er Order Layouts; Interaction 

The analysis presented in §9.2 can be extended to three-way and higher order 
layouts. In tills section we shall consider in detail the three-way layout. Let yj_jj^ 

(i = l,?,...,r; j == k= l,2,...,t) be random variables distributed Indepeir - 

dently according to 


(a) N(m+Ij^jj^,<r- ), 

where 

^Ijk “ ^Ijo ^lok ^ojk ^loo ^ojo ^ook' 

where each set of I's on the right hand side of (b) is such that when summed over each index 
the sum is zero. Thus there are (r-i )(s-i ) linearly Independent constants in the set 
Hi jo I, (r-i ) such constants in the set with similar statements holding for the re- 
maining sets. For convenience, we may consider 0*3 a random variable associated with 

the cell in the i-th row, j-th column and k-th layer of a three-dimensional rectangular 
array of cells. The mean value of la given in (a), in which the the I^j^ and 

the ane row, column and layer main effects . respectively; the row- column 

interactions , the I^qj^ row- layer interactions , and the lojk column- layer interactions . 

The probability element of the is 

(c) 

The sum of squares in the exponent of (c) is 


Now let 




y^.. “g^-^yijk' aimllar meanings for y.j. and 1 ..]^. , 

y^j, “ with similar meanings for y^^^^ and Y. 

with similar meanings for Y.j. and Y. 


♦These are called first-order interactions. 


Yij. - j.+y, with similar meanings for Yj^.^and 

- ^ ^1 jk'^l j • "^1 jk~^l • • J • -k”^ ^ 

“ ^1 jk'^i J • ‘^1 f k‘y r jic'^^i • 5 +y . j . +y . .k"y ^ ^ » 

S..0 = similar meanings for S.^. and 3 ^.. , 

jfk 

S.oo - with similar meanings for and 5^^^ , 


(y-ni)2. 


9> 9r 

^•O'^ ^o-*^ ‘00^ 0*0' oo*' 

We may write 


2 - ^^><Jijk-’ij.-’i.k-^.jk-^i-.-*.j--’'..k-?> 

<fl » <\j--iijo' * «i,k-iiok> * (''.Jk-Ia)k> 

* (\..-Ilool * (^.J.-IoJo) ♦ f’^-.k-Iook) * 

Squaring the quantity In [ ], keeping the expressions within the parentheses Intact, and 
sunmlng with respect to 1 , j, k, we obtain 


’^^O**'*^ ^*00 ■*■ ^OfO ^ 00* ■** ^000* 

It follows from Cochran’s Theorem, §5.2^* , (and can also be shown by moment -generating 
functions) that the el^t sums of squares on the rl^t side of (g), each divided by 0^, 
are Independently distributed according to x.^-laws with (r -1 )(3-l )(t-i ), (r-l )(a-l ), (r-l ) 
•(t- 1 ), (s-i)(t-i), (r- 1 ), (s -1 ), (t -1 ), 1 degrees of freedom, respectively, If the are 

distributed according to (a). 

The sumsof squares In (g) provide the basis for testing various hyixjtheses con- 
cerning the Interactions ^lok^ ^ojk main effects Ij^QQ^ ^ojo' ^ook* 

ample, suppose we wish to test the hypothesis that row-column Interaction la zero (!• e. 
each I^jQ-o) no matter what the row- layer and column- layer Interactions and main effects 
may be. This hypothesis, say H[ (I^ jQ)«o],may be specified as follows: 

\-oo< m, ^loj^ ^ojk^ ^loo' ^ojo^ ^ook ^ ^ ^ ^ ^ 

-Cl: < f or all l,j,k, the sum of the I*s In each set over any 

(h) / Index being 0 . 


cu : 


I Subspace of il obtained by setting each 


0 . 



Maximizing the likelihood in (o) for varlatloiie of the parameters over A, we find 


and maximizing the likelihood for variations of the parameters over <u, we find 


It should be noted that in maximizing the likelihood over A we obtain as maxi- 
mum llkollhood estimates of the quantities Y^j. , 

^•jk'-^l--» ^-J-' respectively. 

When the hypothesis H[(I^jQ)-o] Is true It follows from Theorem (A), §8.5 (see 
Case 3 > $8.43), that 


r8t<^ 


®°.o 


are Independently distributed according to x. ■^e’*s with (r-l )(a-i )(t-i ) and (r-i )(a-i ) 
4 egree 8 of freedom# respectively. Hence the P-ratlo for testing this hypothesis Is 

(t- 1 )S?.„ 


which is distributed according to (r-i )(s-i )(t-i when H[( Ij jQ)-o] Is 

true. In a similar manner P-ratlos can be set up for testing the hypothesis of zero row- 
layer or zero column- layer Interaction. 

The constituents in (g) also provide a method of testing the hypothesis of no 
Interaction between rows and columns In a two-way layout from t (t^a) replications of the 
layout. This hypothesis amounts to the hypothesis that effects due to rows and columns 


are additive on the mean value of the y^^j, in which case the mean value of y^j is of the 
fonn m + Ij^^^ + IqJq* In this problem we consider (1 “ l, 2 ,...,r; j = 1,2, ...,3) as 
the variables associated with the k-th replicate, and assume the mean value of to be 
m + Ij^jq + Ij^QQ + IqJq* The problem la to teat the hypothesis that each Ij^j^ = 0. This 
hypothesis which will be called H' ((Ij^j^Ho] la specified as follows: 

! -ao < m, Ij^jQ, IqJq < CO, > 0, for each 1 and 

j, where the simi of the I 'a In each set over each Index 

1. 0. 

a>: { The subapace ofH obtained by setting each « o. 



Maximizing the likelihood function In (c) for variations of the parameters overil, we 
find 


(m) 

and similarly 

(n) 



By Theorem (A), §8.3, 

(o) 




are Independently distributed according to -x.^-law3 with r3(t-l ) and (r-l)(3-i) degrees of 
freedom, respectively, when Is true, and hence under the same assumptions 

r 3 (t-i) 

(P) F 5 

la distributed according to 

^(r-1 )(3-l ), r 3 (t-i 

In a similar manner, the existence of second -o rder interaction In a three-way 
layout may be tested on basis of replications of the three-way layout. This problem, 
however, leads us into four- way layouts and the details must be left to the reader. 

Suppose we are Interested In testing the hypothesis that the = 

0 no matter what the Interactlais tod main effects due to columns and layers may be. 

This hypothesis may be specified as follows: 


(q) 

We have 

€Uld 


SI: I Same as A In (h). 

cu : I Subspace of SI for which each = 0. 


A 





and hence by Theorem (A), §8.5 , 
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rat ^ 


(r) 


rat 


' Oi jci^ 


•00 


are Independently dlatrlbuted according to 9^ -lawa with (r-1 )(3-l )(t~l ) and (r-l ) degrees 
of freedom, respectively, when H[(I^qq)=o], is true, and the P-ratio for the hypothesis la 


( 3 -l)(t-OS? 


00 


which la dlatrlbuted according to (r-i )(a-i )(t“i Similar testa exist for 

testing the hypothesis that the = 0 or that the = 0. 

Suppose the Interactions I^jq# ^ojk' ^iok zero and that It Is de- 
sired to test the hypothesis that the main effects due to rows are 0, 1. e., » 0. 

This hypothesis say H* [(Ij^QQ)=0] may be specified as follows: 


( 3 ) 


A: 

« 

oj : 


-00 < m, < 00, ^ >0, 

^ ^loo “ ^^ojo ” ^^^ook “ 

I Subspace of A obtained by setting each = 0. 


We find 




and hence 


rst 


(S. 


rstor; 




n 


+ o_ 




- S?., . s?,. . S°. 


r 3 t { or 2 - c ^) s ° 


00 


which are distributed Independently according to x “laws with rst - r- 3- t + 2 and 
r - 1 degrees of freedom, respectively, when Is true. The P- ratio Is 


(rst - r- 3 - t + 2 ) s ‘ 


,00 


(r-l)(S... + S?.^ + S?o. + 3 °..) 


which has the distribution h^p_, (rst-r-s-t+s)^^^*^' when H' [ (Ij^qq)=o] Is true. 

The difference between the P-ratio for testing H((Ij^qq)=o] and that for testing 
H'[(Iioo)=o] should be noted. In the first hypothesis the interactions are^assumed to be 
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imn zero, and In the second one the Interactions are assumed to be zero. The 
sum of squares In the denominator of F for the first hypothesis Is simply S. . . while It 
Is 3... + 3 °.q + 3°q, ■•■3°., In the F for the second hypothesis. The teniis 3®,^, sf^, » 
3®^^ are coononly known as interaction svima of squares, and the process of adding these to 
the error sum of squares 3,.. In the case of testing H* Is often referred to as 
confounding first -order Interactions with error. Of course, the hypothesis may be set 
up In such a way that only two ( or even only one) of the Interaction sum of squares will 
be confounded with error. The tern confounding as it Is ccmmonly used Is more general 
than It is In the sense used above. For example. If layer effects assumed to 

bo zero throughout the hypothesis specified by (s) we would have found not only all first- 
order Interaction sum of squares but also layer effect sum of sqviares 3®^, confounded wltl 

s.... 

There are many hypotheses which can be tested on basis of the S*s on the rl^t 
hand side of (g), and we shall make no attempt to catalogue them here. It Is perhaps 
sufficient to sunnarlze the constituents of the various possible tests In the following 
analysis of variance table (the 21 extending over all values of 1, j, k in each case): 


Variation Due To 

Sum of 3quares 

Degrees of 
Froedom 

Rows 


r - 1 

Columns 

'3^,.o-2:(y,j..-y)" 

8 - 1 

Layers 


t - 1 

Row-Column Interaction 

3?,o-^yij..-yi,ry,j,+y)" 

(]^-i)(s-l) 

Row-Layer Interaction 

3?o.- 2:(yj^,^-yi. .-y. .j,+y)® 

(r-i)(t-i) 

Column-Layer Interaction 

So.-- ^y,jk-y.jr*y-k+y)® 

(3-1)(t-1) 

Error 

s**.- 2:(yijic’yijryi,k-y^jk+yi- .+y.j.+y.,k-y)® 

(r-1 Xs-i )(t-i ) 

Total 

St -^yijk-y>® 

rst - 1 


9 ( ^ 4 Intln 3qviares 

Suppose 7^, (i,j -> i,2,...,r) are random variables distributed acoordixig to 
N(m + + Cj + 1^,0^), where that each T^ occurs In con- 

Jxmctlon with each once and only once, and with each Cj once and only once, each 
occurring once and only once in conjunction with each Cj. Such an arrangetn^t of oonblna- 
tlons of attributes Is known as a Latin 3quare arrangement. For a given r there are nany 
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such aiwmgeiDWits, each of which can be represented In a square array In which the 
would be row effects . the Cj column effects and the treatment effects . For example, 
when r • the following is a Latin Sqxrnre arrangement of row, column and treatment 
effects : 


R ^ +C ^ +T ^ 

^1 ^^2^^2 

R^ +Cj+Tj 

‘^^ 4'*''^4 

1^2’*'^ 1 

Rg+Cg+T, 


^ 2 ‘*'^ 4‘*'^3 

R^'fC^ '♦‘T j 


R^+Cj+T^ 

R5+C4+T2 

'♦‘^2 

^ 4 ‘*'^ 2‘*’^3 

^ 4 ‘*‘^5 '*’'^4 

^ 4 ‘*‘^ 4 ‘^^ 1 


Fisher and Yates ( Statistical Tables . Oliver and Boyd, Edinburgh, 1938) have tabulated 
Latin Squares up to size 12 by 12. 

Now consider the following hypothesis, say H[(T^)-O],to be tested on basis of 
the sanqple y^j 


n: 


-00 < m, Rj^, Cj, < +00, > 0, 


cu : ISubapace in H obtained by setting each » 0, 


In other words, we wish to test the hypothesis that the are all zero, assuming that th 
y^j are distributed according to N(m+RjL+Cj-i'T^,o^). The probability element of the la 


(a) 


['(■ 


1 

f e 




1,J 


Ij- 


The 8\mi of squares S in the exponent may be written as 


where 

3E-^(yij-yi.,-y.j-y(t)+2y)" 

Sp - ^(y^.-y-Ri)^* 


f 




where y - y^. - Y.j “ ?Zyij and y(t) = r^^^^yu* ^ denoting 

summation over all cells (1 and j) In the Latin Square array in which occurs. Let 3p 
be the value of Sp when the - 0, with similar meanings for and S^. 

the likelihood function in (a) for variations of the parameters over 

il we find 


m = y 


Yf Rjl “ yj. - y> Cj = y.j - y, Tj, - y^^.^ - y. 


n-2 ’ <3 


?^<nrsri.-y.rJ|t)«y)' 


Meuclmlzlng the likelihood for varlatlona of the parameters over co, we set T^= o 

(t - l,2,,,.,r) and maximize for variations of m, Cy 

AAA ^ J 

m, Cj to be the same as those obtained by maximizing over 11, and 


2 1 / o oO % 

^oo “ ^2 ( ^ ' 


7 


It follows from Theorem (A), §8.5, (see Case 5, §8.45) that and are Inde- 
pendently distributed according to the y^-l&ws with (r-i )(r-2) and (r-i ) degrees of free- 
dom respectively when H[(T* )=s0] Is true, where 

^ /s 


/V /V 

rrS 


and hence 


(r-2)q- 


(r-2)S^ 


Is distributed according to ^(^-i equivalent to the likelihood ratio 

ratio criterion for testing H((T^)«o], It being understood, of course, that critical values 
of F for a given significance level are obtained by using the upper tall of the F distri- 
bution. 

In a similar manner, If H[(Rj^)«o] denotes the hypothesis for which fl Is Identi- 
cal with that for H[(T^)*»0] while co Is the subspace In fifor which R^ = « ... = »o. 
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then we obtain for P the following: 

(r-2 

which Is distributed according to (r-2)(r-i when H[(Rj^)=o] Is true. 

An entirely similar hypothesis, say H[(Cj)«o] may be defined by considering co 
as the subspace InJlfor which C.| « = o, and an P similar to (e) Is ob- 

tained with the same distribution as that of P defined by (e). 

We may summarize In the following analysis of variance table 




Variation Due to 

Sum of Squares 

Degrees of Freedom 

' 1 

Rows ■ 

3^ = rZI (y^.-y)^ 

r - 1 

Columns 


r - 1 

Treatments 

s? = rZ(y(tj-y)^ 

r - 1 

Error 

j 

(r-1 )(r-2) 

Total 

s = 

l,j 

r" - 1 


The main properties relating to the constituents of this table are the 

following: 

(1) 3=^+3°+s5+Sg. 

(2) 5q/<t^ , Si^/er^, Independently distributed accor-dlng to laws with 

r-1, r-1, r-1, (r-i)(r-?) degrees of freedom respectively, when all R^, Cj and T.^ 
are zero. 

(r-i )sS 

(3) P *= g — ^ is distributed according to (r-1 when ff((R^)*=0] Is true, 

(r-1 )S° 

(4) P =» — g Is distributed according to (r-1 when H[(Cj)«o] Is trua 

(r-1 )3S 

(5) P = — — - la distributed accor»dlng to ^ (p.i when H[(T^)=0] Is trua 

(6) Sg/cr^ Is distributed according to the x^-leLy* with (r-i)(r-p) degrees of freedom for 
any parameter point Infl (1. e. no matter what values m and the R^^, Cy may have). 

p P P 

(7) S/cr Is distributed according to the^ -^'^w with r*-l degrees of freedom when all R^, 
Cj. and are zero. 









The reader will find It Inttructlve to TOTlfy tl»t Sg Is the alinliiium of 

^ for variations of the m, Cj and subject to the restrictions 

Rl - Cj - £ - o,and can be obtained by applying formula (k) of 58.5, noting 

that all are 0 or i . 
pa 

As In the case of two- and three-way and higher order layouts, Latin Square lay- 
outs have been widely used In agricultural experiments. For example In studying the ef- 
fects of r types of fertilizer on yields of a certain variety of wheat. It Is conmon to 
lay out a square array of r^ plots of equal area and to associate row and column effects 
with variations In fertility of soil and associate treatments with different fertilizers. 
The main assumption in such an application is that variation In fertility of soil from 
plot to plot Is such that yield on the plot In the 1-th row and j-th column may be re- 


garded as a normally distributed random variable 


with mean value of the form m + 


+ Cj + T^, (where 21 “ 21 C, « 2 Zt^ » 0, being the effect of the t-th treatment) 

and variance cr which Is the same for all plots. 

Latin Square lay-outs have also been tried out In other fields, for example In 
Industrial research. 


Graeco-Latin Squares 


Hl^er order Latin Squares, known as Graeco-Latin 


may be treated in much 


the same manner as Latin Squares. A Graeco-Latin square Involving, for example, a four- 
way classification may be defined as follows: Let fa^t, IP^I, f7^|, fd^l, 1 « 1 ,2, . . .,r, 
be four sets of mutually exclusive attributes. Let r^ objects be arranged In such a way 
that r of the objects have attribute a^, r have attribute r have attribute and 
r have' attribute <5^, 1 « i,2,...,r, and In such a way that exactly one object has the 
combination of attributes (a^, Pj), 1, j-i ,2, . . .,r, exactly one has the combination 
(a^,7j),and so on for each of the combinations (aj^,(5j), ^7 

conveniently allow the to refer to rows, p^ to columns, * 7 ^ to treatments In ein ordinary 
Latin Square and let refer to the fourth classification. Let y^j (l,j - l,2,...,r) be 

random variables distributed according to N(m 4 -R^+C,+T^+U^,cr^) where R^^, Cy T^, are 
effects due to Oj. 7^. <5^. and where As a matter 

of fact, we may consider the four-way classification Graeco-Latin square as a superposi- 
tion of the two Latin squares |a^|, fp^i, f 7 ^| and |o^l, fp^l, |(5^l, the and p^ re- 
ferring to rows and columns In both cases, the y^ aa treatments In the first Latin square, 
and 6^ as treatments on the second Latin square, such that when the two Latin sqviares are 
superimposed each 7^ will occur with each <5^ exactly once. Two Latin squares which have 
this property are said to be orthogonal . A set of r - 1 mutually orthogonal Latin squares 
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is said to form a complete set of mutually orthogonal Latin squares and v^hen superimposed 
would form an (r+1 )-way classification Graeco-Latin square. Complete sets of orthogonal 
Latin squares exist when r Is a prime Integer and also for certain other values, e. g. 
r «= 4,8,9. The sum of squares S In the likelihood function Is 

and may be written as 

S = Sg + Sp + Sq + Sip + Sy + r ( y”m ) , 

where 

Sr ■= ^(y^.-y-Ri)", 

Sq = XI(y. j-y-Cj)^, 

St “ ^(y(t)-^T^)^ 

^ ^ J 

Su = ^(yfuj-y-Uu)^' 

where y^, , f.y 7f ^(t) defined In §9.4 and is average of all y^j having 

mean values Involving U^. Let be the value of 3^ when the » 0 with similar meanings 
for 5°, and sj. 

As before, we may define hypotheses H[(R^Ho], H[(Cj)«0], H[(T^)=o], and 
H[(U^)=0] all with the same JT parameter space given by 

j-oo < m, Rj^, Cj, T^, Uy < + 00 , > 0, 

I iRi - ICj = - ZU^ = 0, 

but wither parameter spaces obtained by setting each R^ « o, each Cj « 0, each « 0, and 

each = o, respectively. The F ratios for these four hypotheses may be written down by 
the reader In terms of Sg, S^, S^, and 3^. 

The analysis of variance table for the four-way Graeco-Latin square turns out to 


be as follows : 
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Variation Due to 

Sum of Squares 

Degrees of Freedom 

The 

3° = rZ(V y)" 

r - 1 

The Pj 

3q “ rX(y,j-y)^ 

r - 1 

The 7^ 

3? = ^^(y(t)-y)^ 

r - 1 

The <5^ 

CVI 

1 

il 

r - 1 

Error 

Se = ^(yij-yi--y-ry(t)-y[u]^5y)^ 

( r- 1 ) ( r- 5 ) 

Total 

s -^(yif-y)^ 

r^ - 1 


where y, ^(t) same meanings as for the Latin Square and ^3 "the 

mean of all y^^j having attribute <5^. 

The properties of the constituents of this table are very similar to those of 
the constituents of the table pertaining to the ordinary Latin Square and therefore we 

Is the minimum of 
R. = f C, = f T 


shall not write them down. The reader may verify that 
r 


(yi j-m-Ri-C j-Tt-Uu)"^, subject to the restrictions 
and Is obtainable from formula (k), §8,5. 


r 


U 


0 , 


Extensions to higher order Graeco-Latin squares and complete sets of Latin 
squares are stral^tforward. 

9.6 Analysis of Variance In Incomplete Layouts 

The results which have been presented In §§ 9 . 2 - 9 . 5 depend on complete or bal- 
anced layouts In the sense that there Is exactly one random variable corresponding to each 
cell of the layout, or In the sense of orthogonality exemplified by Latin Squares, Graeco- 
Latin squares, and complete sets of Latin squares. Because of this element of balance the 
sums of squares arising in connection with the various hypotheses are relatively simple. 
The problem to be considered here Is that of deriving sums of squares appropriate to tests 
of hypotheses In case there are arbitrary numbers of random variables associated with the 
various cells. 

First let us consider the case of a two-way layout. Let y^ , y^, ..., Yn 
random variables of the sample such that each y belongs to one row and one column In an 
r by 3 layout. If a y, say y^, belongs to the 1-th row and j-th column, we assume It to 


be distributed according to N(ra-fR^ +Cj,cr' ) where 


1 ^ .i 


J 


0 . We may rewrite this 


distribution as N(m+ ^CjX^ ) where for a given a the x^^^ (1 = 1,2 


->ft) 
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are all zero except for the value of 1 corresponding to the row within which occurs, a 
and similarly the (j « are all zero except for the value of J correspond- 
ing to the column within which y^ occurs. 

The likelihood function for the sample y^,...., y^ is 


fi;N(m+£R^x,^^+ ZCjXg, ,<r2). 


and the sum of squares In the exponent of this likelihood function Is 


(h) S 

Cj 

Now suppose we consider the hypothesis that the 1 |j are all 0. This hypothesis H[(Cj)-0] 
may he specified as follows: 

1-G0< m, R. , C. < CD, 0-^ > o,(all 1 and j) 

11: < ^ 

(c) ^ “ ° ’ 

oj : ( Subspace in Cl obtained by setting each Cj - 0. 

Maximizing the likelihood function for variations of the parameters in XI we find from 
§8.5 that the values of m, the R^^ and Cj which minimize S are given by the linear equa- 
tions 


+ n® + X^n^.R^ + X^n jCj - 0, 

• • i J 

-Xya + + A, = 0, 

* "'J" * ^"l/l * ”-3°) * *2 - 


1 ■■ l,2,.«.,r , 


J ■* 1 , 2, •••,8 , 


where ^y^^ denotes summation of all y^ , denotes sumnatlon of all jr^ In the i-th 

row, Jly denotes summation of all y In the j-th colviran, n* . Is the number of y falling 

•ja a * 

In the cell at the Intersection of the 1 -th row and j-th column, n^^. - A’^lj 
n . = 21 n^,. It follows from §8.5 that the minimum of S for variations of the m, anil 
Cj in XI Is given by 




where 




Yy^Y7<x.^7a Yjc T yoL Hya o o 

^ ^ u r. Tl .8 

Z 3 r n n- n„ n , n „ o o 

•'a !• r. .1 .8 

• • 

5 ya “i. "i. 0 “n "is ’ 0 

I • • • • • • 


^ ya "r. 


n . 

r. • I 


0 n. 


"r. "rl 


X! 7a ".1 .... 

• I 


Z ya "a "ii 


n ^ 0 0 

• 9 


"is ’ 0 


0 0 1 


n « 0 1 

• 8 


0 1 


0 0 0 


1 0 0 


and A la the minor of In A. Hence 


^2 1 A 

n“ 
00 


Maximizing the likelihood fimctlon for variations of the parameters overou we find that 
the maximizing values of m and the Rj^ are given by the r + Inequations resulting by set- 
ting and all Cj equal to zero In (d) and deleting the last equation. Similarly, 

.InJS) - , 

where A* Is obtained by deleting the last s + 2 rows and columns from a with exception of 

the next to the last row and column. A » is the minor of V y^ In a ». 

oo .. 

Hence 


- 1 


It follows from Theorem (A), §8:5, that 


Cr^ O-^A 


A 


V A 
00 00 


and 
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are distributed ixidependently according to with n - r - s -i- i and a - i degreea of 

freedom respectively when H[(Cj)"0] is true. The F ratio is therefore 


(f) 


(n-r-a+i )(;A r^) 


*po 


Loo 


(8-1 )(A-) 


•00 


which has the distribution ),(n-r-a+l when H[(Cj)-0] is true. The reader may 

verify that if j - i (all i and j) then n ■■ ra and we have the complete two-way layout 
discussed in and In this case the F ratio reduces to that given in (b), S9«2. 

The extension of the foregoing treatment to higher order layouts is atralght- 
fozward and will not be considered in detail. It la perhaps sufficient to note that in 
the case of higher-order layouts we would have several sets, say q, classifications^ the 
u-th classification consisting of p^ mutually exclusive categories. Such that each y^ in 
the sample would belong to exactly one category in each classification. If we denote the 
mean effect (on y^^) of the v-th category of the u-th classification by I^ whore 

- 0, u - (or more generally several linear restrictions. nay bo applied to 

vsn q 

I^^ for each u) then the mean value of y^ may be expressed as m -f 

for each value of u, (« " i,2,...,n) is \mlty for only one value of v and zero other- 

wise; the value of v for idiich is \mlty being that corresponding to the category (of 
the u-th classification) within which y^ falls. The problem of testing the hyi)othesis 
that I^^ for the u-th classification (u-th classification effects ) are all zero amounts to 
setting up a detenninant corresxK)nding to d in (e) based on q classifications instead of 
2, and performing operations similar to those performed on A to obtain A^^, A*, and 
The reader will find it instructive to work through the details of setting up A, q^ , q^, 
and F for the case of a three-way layout when the hypothesis to be tested is that the main 
effects due to one of the classifications are zero. He will also find it profitable to 
treat the ordinary Latin square as a three-way layout by thle method and show that the F 
obtained for testing the hypothesis of no treatment effects is identical with that given 
by (d) In 89.^. The generality of this procedure should be carefully noted by the reader 
because not only can all of the results previously discussed in this chapter be obtained 
by this procedure, but teats for the existence of interaction between two or more classi- 
fications in incomplete or unbalanced layouts may be deduced by applying the procedure. 

9,7 Analysis of Covariance 


Throughout all of the discussion in 559*2-9.6 we have assumed the mean value of 
the random variable in each case to consist of the sum of a general constant (which is the 
same for all random variables) and constants referring to rows, columns, treatments, in- 



teractlon, etc. It frequently happens that there are practical situations which suggest 
that the mean value of the random variable should Include linear functions of one or more 
fixed variates (see § 8 . 2 ) in addition to the 3 \m\ of constants of the type mentioned above. 


For example, if refers to yield of wheat in a plot in tlie 1 -th row and j-th column of 
a two-way layout, not only should the mean value of y^^ Include a general constant and row 
and column effects, but also linear effect of niimber of plants on this plot, say x^y The 
mean value would then be of tlie form m + + Cy with the usual conditions on the 

and Cj. The object of this section is to examine what modifications of §§ 9 .?“ 9.6 
should be made in order to take one or more fixed variates into account in the mean value 
of the random variables Involved. 

Let us return to the two-way layout discussed in §9.? and assume that the mean 
value of y^j depends linearly not only on m, and Cj but also on a fixed variable x^j. 

In other words, assume that the y^j are random variables independently distributed ac- 
cording to N(m 4 -ax. .+R.‘»-C^,cr^), where 21 R-f = • = 0. The question arises as to what 

-LJ 1 j 1 ^ j ^ 

forms the P- ratios take for testing the hypothesis that the C - are all zero or the hypothe- 

J 

sis that the R^ are all zero, when the A parameter space is the (r + s + i)-dimenslonal 
space for which -00 < a, m, R^^, Cj < +00, cr^ )> 0. The probability element of the y^j is 
exactly that given in (a), §9. 2, with y^j replaced by y^^ - ax^j. Making this substitution 
in (b), §9.2, we see that the sum of squares In the exponent of this probability element 
(for any point in A) may be broken down into the following components: 




j-aX. j-Cj)'-’ + P3(y-ax-m)^, 


where similar meaning for X^j,and ^ with similar me 

Ing for Xj^., X.j. The first sura of squares on the rlgtit In (a) may be written as 

(b) - ^(Y^j-a t- (a , 

- J J 1, J 


where 

n 


Making the substitution (b) in (a) we obtain 5 sums of squares which when divided by 
are (by Cochran*s theorem) distributed Independently according to x^-laws with (r- 1 )( 8 - 1 ) 
-1, 1, r-i , 8-1, 1 degi^ees of freedom, respectively. 

Now suppose we wish to test the hypothesis H^[(C.)=o] which is specified as 


follows : 



) -aD< a, m, R., C. < cd, a > O, (all l,j) 

.c, 

u): I The subspace In il obtained by setting each Cj « 0. 

Maximizing the likelihood function for variations of the parameters In which 
Is equivalent to minimizing S as far as variations of a, m, Cj are concerned, we 
obtain 

(d) “a - y ■ ®Q.* » ‘^A“ 

The sum of squares In the exponent of the probability element for any point In cj (1. e., 
all Cj - 0) may be expressed In terms of the following components: 

(e) - X![(Y^j+Xj) • + rs(y-ax-m)^. 

If j 

Maximizing the likelihood function for variations of the parameters In u amounts to mini- 
mizing 3^ as far as m, a, R^ are concerned. We find 


- 








<r2 -i 
CJ r 




By Theorem (A), §8.3, It follows that 


r8(o;^-<) 


i, ^ . 


are Independently distributed according to x^-lawa with (r-i)(3-l)-i and 3-1 degrees of 
freedom, respectively, vdien H^[(Cj)-o] la true. Hence the P-ratlo for this hypothesis la 

[(r-i)(s-l)-l](^-^) 

which has the distribution ^ 3 - 1 , (x»--| )(a-i )-i when the hypothesis Is true. 

It should be noted that rsc^ €uid rso^ can each be expressed In terras of deter- 
minants (see ( g S6e2) as follows: 
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In a similar manner we may define hypothesis [(R^)»“0] hy replacing Cj-obyR^^-Oln 
the specification of cd. We find that remains the same but for this hypothesis Is 
Identical with that for [(Cj)«o] after replacing Y j and Y j by and respectively. 
The F-ratlo for H^[(R^)=o] Is distributed according to h^^^ (r-1 )(s-1 )-1 

The constituents which are used In making up the P-ratlos for testing the two 
hypotheses considered above may be set forth In the following analysis of covariance table: 


Variation Due To 

Sums of 

Squares 


Degrees of Freedom 

_ Y - 

X 

■MoKiMM 

Rows 




r - 1 

Columns 




3 - 1 

Error 



IVu 

{ r- 1 ) ( a- 1 ) 

Total 



^(yij-y)(xij-x) 

rs - 1 


The results obtained for the case of one fixed variate may be extended In a 
rather stral^t forward manner to the case of k fixed variates where k < (r-i )(3-l ). Thus, 
If k fixed variates P ** i,2,...,k, are taken Into account linearly* In our two-way 

layout, we would begin by replacing y^j In the probability element In (a), §9. 2, by 
(yj^j-^ ^plj^ follow a procedure similar to that for the case of one fixed variate. 
Thus, In place of y (a) we would have^ ^p^lj^ ^ ^p^l*^ 

a^Xp, respectively, where the meanings of Xp^j, y ^ obvious. 


1^1 


,e reader wll] find it Instructive to carry out the details in arriving at P-ratlos for 













IX. APPLICATION OP NORMAL RECTffiSSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 1 


testing hypotheses I^l(Cj)«o] and Hj^((Rj^)-o] which are k-flxed-varlate analogues of 
((Cj)"0] and H^KRj^-o], respectively. 

The procedure which we have outlined for introducing fixed variates linearly 
into the mean value of the random variables in a two-way layout extends in a straight 
forward manner to three-way layouts, Latin squares, Graeco-Latin squares, and to incom- 
plete or non- orthogonal layouts of the type discussed in 59*6. We shall have to leave 
the matter of carrying out details as exercises for the reader. Because of the generality 
of §9.6 it is perhaps worth while to remark, without going through the details of proof, 
that if one fixed variate is Introduced linearly into the mean value of y^,, which would 
amount to replacing m by m + ax® In (a), §9.6, the effect on the determinant a as defined 
In'(e), §9.6, would be to insert another row and column into A as second row and second col- 
the r + a + 5 elements of this row and column being 


reading left to right In the row, and reading top to bottom In the column. This augmented 


determinant has its own A', a'^^ (see §9.6) which are obtained by operations analo- 
gous to those used In obtaining A^^, A', a'^^ from A In §9.6. The extension of our pro- 
cedure to the problem of linearly taking into account k fixed variates in the mean value 
of y,^ in §9.6 la stralghtfoiward and will be left to the reader. 



CHAPTER X 


ON COMBINATORIAL STATISTICAL THEORY 


Many problems In distribution or sampling theory in statistics reduce to combin- 
atorial considerations. For example, the derivation of the binomial distribution (53.11) 
depends on the determination of the number of distinct orders in which x p*s and n-x q*3 
can be multiplied together, and similarly the derivation of the multinomial distribution 
(53.12) depends on the enumeration of the number of distinct orders in which n^ P^ 

^2 Pj^*3 can be multiplied together where ^ -i > ^ n^ « n. A majority of 

the combinatorial problems of the drawlng-balls-from-ums variety Involve direct applica- 
tions of permutation and combination formulas, which in turn are often simply expressible 
in terms of binomial and multinomial coefficients. ■The theory of sampling from a finite 
population ( 51 ^. 3 ) l9 based on the use of binomial and multinomial coefficients and their 
use as weights in various averaging operations. The sampling theory of order statistics 
( 51 ^. 5 ) is a direct application of the multinomial distribution law to probability functions 
of continuous random variables. 


The object of the present chapter is to discuss some of the more complicated dis- 
tribution problems in combinatorial statistical theory which are of particular Interest in 
applied mathematical statistics. More specifically, we shall present some results on the 
theory of runs, the theory of matching and its application to testing Independence in con- 

p 

tlngency tables, Pearson's original -problem, and inspection sampling, 

16.1 On the Theory of Runs 

Suppose we have an arbitrary sequence of n elements, each element being one of 
several mutually exclusive kinds. Bach sequence of elements of one kind is called a run . 
The simplest case is that in which there are two kinds of objects. We shall consider this 
case in detail, and also present briefly some results for the case of several kinds of 
elements. 


10.11 Case of Two Kinds of Elements 

. Suppose we have n^ a*s and n^ b's (n^+n^-n). Let r^ j denote the number of runs 

of a *3 of length j and r^j denote the number of runs of b*s of length j. For example, if 




the arrengement la 


aaabbaabaabbab , 


then r,, - i, - a, -i , Pg, - a, r^g - a, and the other r'a are zero. It should be 
obaeinred that , the number of a's, and also X! jPgj - ng. Let r, - j 

and Pg - denote the total number of runs of a*a anl b* a, respectively. Por^a given 


set of nvmbera r 


,,, r,^, r,,,... there are - — r;; — i — 3 — r ways of arranging the r, 

^11 *^12 Mni rJ ^ 


runs of isi*8. And for a specified set, r^^, there aj?e — n; — i ;; r wajo ui eirrang- 

^21*^22 ^2n2* 

Ing rg runs of b*8. It la clear that r^ cannot differ from r^ by more than unity, for If 
It did two runs of one kind of element would have to bo adjacent, but this la contrary to 
the definition of runa. If r^ " ^2^ ®' 8^^®^ arrangement of runs of a*a C€ui be fitted Into 
a given arrangement of runa of b*3 In two ways, either with a run of a* a first or with a 
run of b*s first. We define the function F(r^',rg) to be the number of ways of arranging 
r.| objects of one kind and r^ objects of another so that no two adjacent objects are of 
the same kind. Clearly, 


ways of arrang- 


P(r, ,rg) - 0 If Ir^Tgl > i 
- 1 If lr,-rgl -1 


2 If r, - Pg. 


total number of ways of getting the set r^j (1-1,2; J - i,2,...,n^) 


IJ^ ! r^llrgg! . . .r^. 


• P(r,,rg) 


Since there are ^ t f possible arrangements of a’s and b*s, the Joint distribution func- 

.iig. 

tlon of the given set r^j (all possible arrangements given equal weight) la 
(t>) ' r r * F(r^,r„) / - , . 

ij p,, ....p,j^^. r2i****^2ng* / n^^Ug. 

Now let us determine the joint distribution of the p^ j. To do this we sum^Pj^j) 


with respect to the Pgj. We wish to sum 


all p„< such that 


p ' p ' 

^ -d Tg, p . 

^jPgj - n, and A-T-j - r,. In'^order to do this, consider 
j-1 j 


over all partitions of n,, V o,,for 


(x+x^+x^+. . . ) ^ - — j 

n-x) 


Pg ^ (Pg-i+t)! 

- ri. I X . 


X 



20 ?. 


X, ON COMBINATORIAL STATISTICAL THEORY 


{ 10,11 


It la evident that the coefficient of x In the Initial expression Is the sum 
r^l n^ 


/ — f— r that we desire. The coefficient of x in the final expression Is the co- 

• * •^2n • 

efficient of the term for which r^+t » n^, 1. e.,t - n^-r^. Therefore the desired sum is 


;r^-i+n2-rp)! 

>2-1 


(n2-i): 


* ' (n -r ~T * ■ Joint distribution function of the r^ j and 


P(r, ,> ) = - — , r 


rvTyrTnTF^TT • P(r,,r2) 


Now we sum out r^. By (a) we get 




rIFl'"? '*'2 2 “ 


(n^-l)! (n^-OJ 

F(r, .r^) ■= (r^-2)j(n^-r^+i )J ' ’ + (r,-i )J(n2-r, )J ’ ^ 


(n^-i )1 
J(n2-r,-l 


(Ug+I )! 

r, I{ng-r,+l 


This gives US the joint distribution function of the r^ j 



with a similar expression holding for the joint distribution of the 

Another Important d^latrlbutlon Is the joint distribution of r^ and r^. We get 
this by summing out the r^ j In (c), just as we summed (b) with respect to the r^j to oh* 
tain (c). The result is 



Finally, we find the distribution function of r^ by summing (e) with respect to r^, ob- 
taining 


The distribution of the total number of runs of a*3 and b’s la of considerable 
Interest In applications of run theory. It is used as a teat for randomness of the ar- 
rangement of a* 3 and b's; the smaller the total number of runs the more untenable the hy- 
pothesis of randomness. Let u » r^ + r^, the total number of runs. To find the distribu- 
tion of u we must sum (e) over all points In the r^ , plane for which u - r^ + rg. We 
have two cases, (1) u » 2k (even) and (2) u - 2k- 1 (odd). To find the probability that 
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u - 2k, we note there la only one point In the r^^r^ plane for which u « r^ + r^ « 2k 
where P(r^,r2) ^ 0, and that point la (k,k). When u » r^ r^ » 2k - 1 there are two 
points at which P(r^,r2) ^ o, namely (k, k-1 ) and (k-i, k). Hence from (e) we have at 
once (using the notation « ^n^^' 


(g) 


Pr(u«2k ) 


Pr(u«2k-1 ) 



I ■ <k -2) • <k -1) 




This distribution was derived by Stevens* and also by Wald and Wolfowltz** and 

u ^ 

■ .Jai-. # ## 

the function Pr(u^u*) = > p(u) has been tabulated by Swed and Elaenhart for n^ ^ n^ 
(n^-m, n^-n In their notation) from the case n^ * 2, n^ « 20 to n^ « 19, n^ - 20 for 
various values of u * . 


Another probability function of considerable Interest In the application of the 
theory of runs Is the probability of getting at least one run of a *3 of length 3 or 
greater or In other words the probability that at least one of the variables r^g, 
^ls+ 2 ^’'*^ in the distribution (d) Is ^ l . Mosteller**** has solved this problem for the 
case n^ = n^ == n. To obtain this probability we put n^ - n^ = n In (d),thu3 obtaining 



and sum over all terms such that at least one of the variables r^g, r^g_^^,... ^1. We can 
accomplish the same thing by summing over all terms such that all of these variables are 
zero, and subtracting the result from unity. To do this we must sum the multinomial co- 
efficient In (h) over all values of r^^,...,r^^ such that r^g - ^13+1 “*•••“ 


*W. L. Stevens, ’’Distribution of Groups In a Sequence of Alternatives”, Annals of 
Eugenics j Vol. IX ( 1939 ). 

A. Wald and J. Wolfowlt?, "On a Teat of Whether Two Samples are from the Same Popula- 
tion", Annala of Math . Stat . . Vol. XI (19**0). 

### 

Frieda S. Swed and C. Elaenhart, "Tables for Testing Randomness of Grouping in a 

Sequence of Alternatives", Annals of Math . Stat., Vol. XIV (191*3). 

#### 

Frederick Mosteller, "Note on an Application of Runs to Quality Control Charts", 
Annals of Math . Stat . . Vol. XII (191*1). 
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sio.n 


^ Jr, . • n, ^1 4 then aura with respect to r- . It will be noted that the aura 

T 1 'J ' ' 1 

of the multinomial coefflclenU under these conditions la given by the coefficient of 


In the formal expansion of 


t-O ^ 


which Is 


^ . r n-j(a-i)-l 

3 ^ ^ 


r ^-1 ^ • 


The desired probability of at least one run of length a or greater Is t^herefore 


( 1 ) 


Pr(at least one of J^s) » i 


z: 

r, 


j “0 




n-l- j( 3 - 

Ti-l 


1 ) n+1 

)( . ) 


( 211 ) 


the avannatlon on extending from the largest Integer ^ Applying similar 

methods to each of the multinomial coefficients In (b), Mosteller has shovm that the prob- 
ability of getting at least one run of a's or b's of length s or greater is 


(J) Pr(at least one of r, j or r^ j > l , j ^ a) = l - A/(^JJ), 

where 

A - 

the r^ summation being similar to that In ( 1 ). Mosteller has tabulated the smallest value 
of B for which each of the probabilities ( 1 ) and ( j) Is ^ .05 and .01 for 2 n - 10, 20, 50, 
4 o, 50. 

In order to Indicate how to find moments of run variables let us consider the 
case of r^ . We shall first find the factorial moments E(x^®^) where 



x{a) 


x(x-l )(x- 2 ). . .(x-a +1 ) » xi/(x-a)] , 


for they are earsler to find then ordinary moments In the present problem. Prom them the 
ordinary moments may be found since E(x^^^) Is a linear function of the first 1 ordinary 
moments. Letting 1 - i^ 2 ^...,a^ we obtain a system of a linear equations which may be 


solved to obtain the ordinary moments as linear functions of the factorial moments. 
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We have 
(k) 




'1 ■ “r 

In order to evaluate (k) we use the following identity: 


( 1 ) 


^0 (C+1)!TA-C-1)1 ■ " 

which follows at once by equating coefficients of in the expansion of 

(m) . 

Therefore we have upon substituting p(r, ) from (f) into (k), simplifying, and using (1) 

(&\ (n„+i-a)l / nl 

(n) E(r, )-(n 2+i) * ^ Cr;-T)](n;-r; )J ‘ (r;-a) Kn^+l-r, )I ^n^Tn^T 


From this result we find 



, (n,*l)l = )n / = > 

77 ^ — • 

A similar expression holds for E(r2^®'b. 

If the.a's and b's are regarded as elements in a sample of size n from a bi- 
nomial population in which p and q represent the probabilities associated with a and b, 

respectively, then n. , the number of a's, is a random variable distributed according to 
' n. n 

the binomial law p q . The probability laws analagous to (b), (c), (d), (e), (f) 

when n. la regarded as a random variable in this manner are simply obtained by multiplying 

n, n„ 

each of these probability laws by p q . 

10»12 Case of k Kinds of Elements 

The theory of runs has been extended to the case of several kinds of elements by 
Mood*. If there are k kinds of elements, say a^, a^,'. ..,a^, denote by r^j the number of 
runs of a^ of leiigth .1. Let r^^ be the total number of runs of a^^. Mood has shown that 

*A. M. Mood, "The Theory of Runs", Annals of Math . Stat ., Vol. XI (19**0) . 
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the joint distribution law of the r^^j Is given by 


(a) 


P(rij) 


r r ^ / n *n * * rTT * 


where P(r^ ,r 2 , . . . ,rj^) ia the number of ways objects of one kind, r^ objects of a second 
kind, and so on, can be arranged so that no two adjacent objects are of the same kind, 

F(r^ ,r 2 , . . ,,rj^) Is the coefficient of x^ •••^k ^ expansion of 

k r--i r -1 iv-l 

(b) (X^+X2+. . .+Xj^). (Xg+X^^. . .+X^) (X^+X^+. . .+X^) . . . .(X^+Xg+s . ) 


The argument for establishing (a) Is very similar to that for the case of k « 2 and will 
not be repeated. Mood showed that the jolht distribution function of yTg, . . ., 1 ^ Is 
given by 



which we state without proof. Various moment formulas and asymptotic distribution func- 

% 

tlons have been derived by Mood In the paper cited. 

If Instead of holding n^,ng,,..,n^ fixed In the run problem for k kinds of 

elements, we allow the n*3 to be random variables with probability function 'rt(n- ,no, . .^n^) 

(e. g., the multinomial distribution with "the run distribution functions (a) and 

1 ^ 

(c) would simply be multiplied by Tr(n^,ng,...,n^). 

10,2 Application of Run Theory to Ordering Within Samples 
Suppose Ogn+l ^^2^ • • • ^^2n+l ^ ®’ sample from a population In which x Is a 

continuous random variable. Let x be the median value of x In the sample. Let each 
sample value of x < x be called a and each sample value of x > x be called b. There are 
n a*s and n b*s In the sample. Ignoring the median (which Is neither). Now suppose we 
consider all possible orders In which the ssunple x’s could have been drawn (Ignoring the 
median In each case). It Is clear that all of the run distribution functions (b), (c), 

(d) , (e), (f) are applicable^ for n^ « n^ «= n, to this aggregate of possible orders of the 
x*3 (1. e. a* 3 and b*s) In the sample. If there Is an even number, say 2n, items In the 
sample, we can take any number between the two middle values of x In the sample as a num- 
ber for dividing the x*s Into a*s and b*3, and our run theory Is Immediately applicable ^o 
this case with n^ =» n^ * n. In general if In a sample of size kn + k - 1 we choose the 
(n+1 )th, (2n+2)th, (3n+3 )th, . . . ., (k-l )(n+1 )th values of x In Increasing order of magnitude 
as points of division, and let all x’s less than the (n+i )th x be denoted by a^ , those 
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between the (2n+2)th and (]||^ji^)th by a^, and so on, we then reduce our sample to n a^ 's, 

n ag'a, ,n a^* 8 - Ignoring the k - i x*s used for division points. It Is clear that 

run theory for k kinds of objects Is applicable to the aggregate of all possible orders in 
which sample x *8 could occur (ignoring the x *3 used for division points). The points of 
division can, of course, be taken so as to yield an arbitrary number of a^ *s, a^’s, etc. 

By classifying the values of x in a saimple into a*s and b*s (or more generally 
into a^ *s, a^’s, . . .,a^*s) and using the theory of runs we have a basis for testing the 
hypothesis of randomness in the sample as far as order is concerned. The more commonly 
used teats of the hypothesis of randomness based on run theory are: 

( 1 ) Number of runs of a’s^for which the distribution Is (f), §io.ii. For given 
values of n^ and n^, the test consists in finding the largest value of r^ (the 
number of runs of a* 3 ), say r°, for which Pr(r^^r^) ^ €, e. g., for € = .05. A 
similar statement may be made concerning runs of b* 3 . 

(2) Total number of runs of a*s and of b *3 having distribution (g), §10.11. Again, 
the teat consists in finding the largest value of u, say u^, for which 

Pr(u ^ u^) for given values of n^ and n^. 

(3) At least one run of a*s (or b*a) of at least length 3, for n^ -» n^ = n, based 
on the distribution ( 1 ), §10.11. The test consists of finding the smallest 
value of s for which probability ( 1 ) is ^ €. 

( 4 ) At least one run of either a*s or b*s of at least length s, for n^ - » n, 

based on the distribution (j), §10.11. The test consists of finding the small- 
est a for which probability ( j) is ^ €. 

The distribution theory of each of these tests has been determined under the 
assumption that the hyxx)the 3 l 3 of randomness is true, with a view to controlling only 
Type I (see § 7*5 ) errors. Type II errors for these tests have never been Investigated, 
i. e., probability theory of the teats when some alternative weighting scheme (other than 
equal weights) is used for the different possible arreuigements of a *3 and b* 3 . 

It should be noted by the reader that the theory of runs developed in §10.11 is 
not applicable to the following type of problem of reducing a sample to two kinds of ele- 
ments: Suppose x^,X2,...,x^ are elements of a sample from a population with a continuous 
distribution function. Consider an arbitrary order of these n x* 3 , and between each suc- 
cessive pair of elements write a if the left number of the pair is smaller than the right 
fiuid b if it is larger. We then have reduced the sample to n - 1 a »3 and b* 3 . We may de- 
fine runs of a*a and b *3 as before, but the theory of arrangements of the a’s and b *3 as 
defined from the corresponding arrangements, and hence the distribution theory of runs of 
this type, is an unsolved problem in combinatorial statistics. 
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A problem which frequently arises In combinatorial statistics Is one which may 
be conveniently described by an example of card matching* Suppose each of two decks of 
ordinary playing cards Is shuffled and let a card be dealt from each deck. If the two 
cards are of the same suit let us call the result a match . Let this procedure be contin- 
ued until the entire 52 pairs of cards are dealt. There will be a total, number of 
matches, say h. Each possible pennutatlon of one deck comjgared with each possible per- 
mutation of the second deck will yield a value of h between 0 and 52, Inclusive. There- 
fore if we consider all of these possible permutations with equal weight, we Inquire as 
to what will be the distribution function of h In this set of permutations. Similarly If 
we consider three decks , Dg, and of cards to be shuffled and matched we would have 
triple matches and three varieties of double matches. A triple match would occur If the 
three cards In a single dealing from the three decks were of the same suit. As for 
double matches, they would occur between decks D^, Dg, between and between Dg, D^. 

The problem arises as to what will be the distribution of triple matches and of the three 
varieties of double matches. 

Extensions of the problem to more than three decks, to decks with arbitrary 
numbers of cards In each suit and an arbitrary number of suits suggest themselves at once. 
In this section we shall present spme techniques for dealing with this problem without 
attempting to be exhaustive. It will bp convenient to continue our discussion In card 
terminology, for no particular advantage Is gained Ih Introducing more general terminology. 
The generality of the results for objects or elements other than cards Is obvious. 

10.31 Case of Two Decks of Cards 

Suppose we have a deck of n cards, each card belonging to one and only one 
of the k suits C^, Cg,...,Cj^. Let n^^, n^g,...,n^j^ n^^»n) be the nimiber of cards be- 

longing to C^, Cg,...,C|^, respectively. Let Dg be another deck of n cards, each card be- 
longing to one and only one of the classes C^, Cg,...,Cj^. Let ng^, n2g,...,ngj^ 

(^^ngf-n) be the number of cards In Dg belonging to C^, Cg, ... ,(^, respectively. 

The problem Is to determine the probability of obtaining h matches under the 
assumption of random pairing of the cards. In other words, we wish to find the number of 
ways the two decks of cards can be arranged so as to obtain exactly h matches. Dividing 
this number by N, the total number of ways the two decks can be arranged, we obtain the 
probability of obtaining h matches under random pairing. The value of N. Is simply the 
total number of ways the two suits can be permuted, and Is given by the product of two 
multinomial coefficients: 
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^ ’^ 21 * 


To determine N(h), consider the enumerating function 


xiyje J ) , 


where a « i , If 1 - j, and o,.lf 1 j. We associate the auxiliary variables x^ ,Xg,..., 
Xj^ with the suits C^, Cg,...,Cj^ respectively of the first deck, and the auxiliary vari- 
ables ,y 2 > ' " corresponding suits of the second deck. Is the product of 

n Identical expressions, each expression consisting of the sum of tenns, each term 
being a product of an x and a y. The term In any one of the n factors corres- 

ponds to the event of a card In suit of the first deck being paired agrlnst a card In 

A 

suit Cj of the second deck. If 1 » j we have a match, and e occurs as a factor. Now 
suppose we pick a typical tenn In the product given In (b). Such a term would be of the 


“l j ® 


•* 11 ® 


(x^ Yj e )(x^ Yj e ^ )....(x^ y. 


**i j ® 
e ). 


This general term corresponds to the event of n pairings as follows: a pairing between 
of and Cj of D^; a pairing between of and Cj of D^; and a pairing be- 

between of D. and of Dp. Now If the compositions of D. and Dp are specified as 
n^^, n^g,...,n^j^ and n^^ ^n^^, ... ,n 2 ^, respect IvelY, then It follows that the onlY teims in 
the expansion of (b) which have anY meaning for pairings of these two decks of cards are 
those of the form 


-he . ^^ii-’^ia ^ik ^21 "22 " 2 k 

e X, Xg ...Xj^ y, yg 


where h Is an Integer such that 0 ^ h ^ n. It should be noted that such terms may not 


^Various authors have considered various enumerating functions, but the one which we shall 
use was devised by I. L. Battin, ”0n the Problem of Multiple Matching**, Annals of Math . 
Stat . , Vol. XIII ( 19 ^ 2 ). Battin *3 function Is relatively easy to handle and has the ad- 
vantage of representing the two decks of cards symmetrically In the notation and opera- 
tions. It extends readily to the case of several decks of cards. The reader should re- 
fer to Battings paper for a fairly extensive bibliography on the matching problem. 
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exist for some values of h between 0 andH, which means that It Is not always possible to 
have any arbitrary number of matches for given deck compositions. The tenn given In (d) 
corresponds to some arrangement of the two decks of cards such that there are exactly h 
matches. In general there are many such terms. Therefore, If we expand ^ and determine 
the coefficient of the expression given by (d) we obtain the value of N(h), the number of 
ways In which h matches can occur. To simplify our notation let K^(4)) denote the opera- 
tion of taking the coefficient of expression (d) in the expansion of 4>. We may rewrite 
as 

(e) 4) - + (^x^)(^ y^) - 

Expanding we have 

(f ) 'b - p^Xj^X^y^) - (^Xj^y^^) 

Expanding the expression In [ ], we have 




(8) 1 1"'" 

^ ^ ff ^ 

Inserting this expression Into (f ), and expanding (2Zxj^)® (T~ Xj^yj^)"~° we find 


N(h) - Kjj((i)) 


H-iWnwn-h.„ 

g ^^g ' 


where 


(gJ)^(n-g)! 

Tg-A-lE 

®1 IT [(n,j^-Sj^)](ngj^-aj)Jsj^.»] 


the avminatlon extending over all positive integral (or zero) values of the such that 

Sj^ - n-g and ^ 0, ^ o> i “ The probability P(h) of obtaining 

h matches Is therefore N(h)/N, where N Is given by (a). 

For the case k - 2, the probability of h matches reduces to the following ex- 


pression 




<n“ ) 

"l 1 *^22 


where 1 - ^ -Ugg+h), j - ■|:(n^ ^+n 22 -h). Unless h is such that for given values of 

and ngg, n^^ + (n^g-h) are positive even Integers or o, then P(h) - o. 
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Grevllle* has given the distribution of h In a slightly different form and by 
another method. 

Moments of the reuidom variable h can be found directly from the enumerating 
function We have 



It should be noted that our results can be readily extended to the case of two 
decks of caixis in which the total numbers of cards are different or where one or more of 
the suits may have no cards at all. To consider, the case of unequal total numbers of 
cards, say n^ In deck eind n^ In deck where, without loss of generality, we can let 
n^ > n^, we simply add to n^ - n^ dummy cards, and consider them as a new suit. We 
would thus have k + 1 suits of cards, where the (k+i )-thsult Is empty in D^, 1. e. 

» 0, - n^ “ ^2* procedure from here on Is just as before. The case In 

which some of the suits are empty In one or both decks Is taken Into account by specifying 
the values of the corresponding n^^ or n^j^ as O In expanding ^ and collecting terms. 

The reader should note that if a score s^j Is assigned to a pairing In which the 
D-j card belongs to the 1-th suit and the Dg card belongs to the j-th suit, then one can 
find the distribution of the total score T In n pairings (1. e.,when the two decks are 
paired against each other) under the assumption of random matching, by replacing by 

*T. N. E. Grevllle, "The Frequency Distribution of a General Matching Problem", Annals 
of Math . Stat., Vol.. XII (19^0. 
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a^j In (b) and finding the coefficient of 

1 _’^2k 

U O Ag m * mXy^ 

in the expansion of the resulting expression. The procedure for finding E(T) and and 
higher moments Is the same as that* for dealing with the moments of h with 3 ^^ substituted 
for i^y 

10.22 Caae of Three or More Decks of Cards 

Suppose we have a third deck of cards, say D^. Let the numbers of cards be- 
longing to suits C^, be n^^*, A triple match has been defined as one 

In which the triplet of cards (one from each deck) are of the same suit. A double match 
between and Dg will occur when the cards from and Dg In a triplet are of the same 
suit but different fhom the suit of the card from Dj In the triplet. Double matches be- 
tween D, , Dj and Dg, Dj are similarly defined. If In the complete set of n triplets frem 
the three decks we let h^gj be the number of triple matches, h^g the number of double 
matches between D^, Dg, with similar meanings for h^j and hg^, we may obtain the distribu- 
tions and moments of the h's from the following enumerating function: 

^ljk®l23 '’lj®i2 '’ik^lj *’jk®25 „ 


where -I, If 1 - J “ k, emd O otheiwlse. The remaining d's are defined as for the 
2-deck problem. By following an argument similar to that for the 2-deck problem. It will 
be noted that the number of ways In which the three decks of cards can be permuted so as 
to obtain h^g^ triple matches, and h,g, h^^, hg^ double matches between D, , Dg,* D^ , D^; 
and Dg, D^, respectively. Is given by the coefficient of 


(b) 

where 


g^123®1S3 ^12^12 * ^13^13 * ^23®23 . q 


O "11^12 

Q-X, Xg • 


n 


Ik 


"21 “22 


h 


n, 

2 • • ‘^k 


2 k "31 "32 " 3 k 

. Z, Zg ...Zj^ 


In the expansion of 4>, 

This coefficient and hence the joint probability law of the h*s la rather cimi- 
bersome and will not bo given here. As in the case of the 2-deck problem we may find 
moments and joint moments of the h's by performing differentiations on 4> with respect to 
the e's, that Is, 




of din ; . ■ I 


where ♦ 


T 1 (^11 *^21 •^ 31 * ^ 


The mean values of the h’s are the following: 


E(h,„) 




E(h,2) (^n,in2i)(^n3^). 


with similar expressions for E(h^j) and ECh^^). The reader may refer to Battings paper 
for second moments. 

The extension of our technique to the problem of determining the distribution 
and moments of the numbers of hits for various orders of multiple matching when more than 
three decks of cards are Involved is immediate* The extension of the results to the case 
of decks of xmequal numbers of cards, empty suits, etc., when three or more decks are con- 
sidered, is straightforward. 

10.4 Independence in Contingency Tables. 


In this section we shall consider the problem of testing the Independence of a 
two-way classification on basis of a sample of n elements, each element belonging to one 
and only one the classes and to one and only one of the classes , Bg, 

...,Bg. In the sample, let n^j be the number of elements belonging to and By Let 
]^n^j «■ n^^j =» n j, ^~^n^ j « n. The number of elements belonging to A^ is n^ and 

the number to Bj is n j. The problem is to test. the hypothesis of the independence of the 
A and B classification. We shall consider two approaches to this problem. The first 
(§10.41 ) is a pure combinatorial approach based on partition theory in which the set of 
all possible partitions of n into rs components n^j satisfying the marginal conditions 
listed above are Investigated. The second approach ( §1 0.42), which is Karl Pearson’s orig- 
inal treatment of the problem, is an application of the theory of sampling from a multi- 
nomial population consisting of the rs classes (A^^Bj) 1 - i,2,..,r; j - l,2,..^s. 

10.41 The Partltional Approach 

In this section we shall consider the problem of determining the number of ways 
of partitioning the Integer n into rs Integers (or zero) (i-1 ,2, . . • ,r; j-1 ,2, . . . ,8 ) 




1 


such that j ^ixed. The technique discussed In 810.3 can 

be extended so as to accomplish this enumeration. We shall then find the mean values of 


certain functions of the n^i^j over this set of i>artltlona. 

We may represent the n^i^j, n^ , n j and n In the following 

/ 

Total 


table: 


>^11 

>^12 

• 

• 

e 

» 

Cfi 


^21 

• 

• 

^22 

• 

• 

• . . • Hgg 

• • 

• • 

^ 2 . 

• 

• 

• 

n', 

• 

# 

• • 

• • 

.... n^g 


^1 

^2 

.... n g 

n 


Consider the enumerating function 


1«1 j -1 ^ 


3 6 - 


vhlch Is the product of n factors, of which are 21 > ^2 which are^x^e 

and so on. A typical term In the expansion of this product of n factors Is of the form 




1 1 

(X, ^ e ^ 


? ^12 ^ ? “la^is 

. (x„ e ^ )...(x e ^ ), 


^11 V ^11 ^ 

where n^^ Is the niimber of times x^e Is taken from the n^ factors (^XjO '*) , 

with similar meanings for . . ,n^^. If ^n^^ “ 

corresponds to one way of partitioning n Into the set n^i^j so that ^ j 

- n^. To find the total number of ways of partitioning n Into the given set n^j 
we must determine how many Individual terms In the expansion of (b) are Identical with (o)l 


In other words we are to find the coefficient of 

TT “ 1 

(d) (J ' Xj*J)e^'J 

In the expansion of (b). 


(J- Xj-J)e 




Expanding each of the terms (^x^e , 1 - l,^,..^r, by the multinomial law 

and multiplying the results and taking the coefficient of the expression (d), we find at 
once that the number of partitions of n into the sets of values n^^j, subject to the mar- 
ginal conditions ^ j' ^n^^j • n^i^ , Is 





The total number of ways of partitioning n, subject to the marginal conditions mentioned 
above > Is 

(f) t-t; . • 


Therefore the probability of partitioning n into the particular set of values n^^^, assum- 
ing all ways of making partitions (subject to the marginal conditions) equally likely, is 
given by the ratio of expression (e) to expression (f ). 

The momenta of the n^^j may be found directly from the probability law of the n^^j. 
Consider first the problem of detemlnlng the h-th factorial moment of a particular n, ,, 


say n^j^. We have 




1 ■ 


where 2— denotes summation over all values of the n^ j subject to the usual marginal con- 
ditions. Now when h - 0, we know that the rlfdit hand aide of (g) is simply the sum of the 
probability fvinctlon of the n^^j over all possible values of the n^^j and is therefore 
unity, which amounts to the statement that 




Now the numerator on the right hand side of (g) may be written as 




where nJ => n. for all 1 except 1 - a and n' •• (n -h) and nJ , « n. , for all 1, j ex- 
1.1. a.a. ijij 

cept for 1 « a, j » p and = n^^p-h. Now perform the summation indicated in (1) over 
all values of the nJ . subject to the conditions ZInJ . = nJ and Z nJ . = n* . where 

ij jiji. 

n* . » n j except when j » P and n* * n^-h. It follows from (h) that the value of this 

• j . j .p .fj 


sum is 


{n-h)J 

TT^ 


Therefore we have 




A similar expression holding for a ^ y, ft ^ 6 . The restrictions on the size of g and h 
are obvious. These moments can also be found directly from the enumerating function i> by 
carrying out appropriate differentiations on the then setting the 6*s » o and col- 
lect Ing appropriate coefficients . 

The criterion which Karl Pearson def *ned for testing the hypothesis of row- 
colxmm Independence In r by s contingency tables is defined as follows 



n 


which la a quadratic form In the n^j. It should be noted that Is simply the sum of the 
squared differences between each n^j and Its mean value (under the assumption of Inde- 
pendence or **randomnes3**), each squared difference weighted Inversely by the meeui value of 
the corresponding This Inverse weighting scheme suggests Itself fairly readily in 

the Pearson approach to be considered In §10.42. The mean value of may easily be 
found by making use of formulas (k) and (1), and is 




(o) E(jtf) - ^(r-1)(8-1). 

By using formulas ( j) and (m) for the appropriate values of g and h, the variance and 

p 

hl^er moments of x nay be found. 

10.J»2 Karl Pearson's Original Chi-Square Problems and its Application to 
Contingency Tables . 

Suppose n is a multinomial population in which each element belongs to one and 
only one of the classes jCg, . . .,Cjj. Let p^jpg**. he the probabilities 

associated with C^,C^, , , . ,C^ respectively. In a sample of size n let ,ng, . . .,nj^ be the 
numbers of elements falling into C^,C 2 ,...,C^ respectively. We have seen (55.12) thAt the 
probability law of the n^^ is 


n! "2 3 

i{;'inj'::.n;,r Pi Ps •••Pk • 


It was shown in § 5.12 that E(nj^) = npj^. In view of the Central Limit Theorem (5**.2l ) It 
is clear that the limiting distribution as n — » 00 of each of the quantities 


Vp^d-Pi) 


1 => 1 >2, . . .,k 


is N(0,i). Now let US investigate the limiting joint distribution of the set 


(nj^-npj) 


i ^ 1^2f...^)c. 


Since X » 0 oPly k - 1 of the x. are functionally Independent. It is sufficient tc 
1 

consider the limiting joint distribution of the first k - l of the The m. g. 'f. of 

jXg, . . . is 


^e^x^ ^e^(n^-np^)/fiT 

= E(e ) = ^_(e 7-+ n 


~ JL X X n* n. 

Tln^J ^ 

®iPi e,//n So/’^ ®b-i/Vr 

(p,e ’ +p„e ^ +...+p,_,e " /’“+p,j’^. 


-vsi; 


Expanding each of the exponentials In ( ) and taking logarithms, we have 



(C) 


log (t» 


1 ^ ^ 1 Vn 1 2n 


K-l 

- 5'fi!^ , O(^). 


r ^iPi 

2 


Therefore we have 


Lira i « e 
n— » (X) 


1 ^ 

^ iJ-1 ^ ^ 


where ~ PiPj» “ i>2,...,k-l, where dj^j - 1 > 1 - j, and o, 1 J. Making 

use of the multivariate analogue of Theorem (C), §2.81, It follows that the limiting prob- 
ability element for the distribution of the Is 




(SIT)^ 


dx j 1 ^ 


where I lA^^jl I - I . It may be readily verified by the reader that 


and hence 


( 8 ) ^ ^ 11 * 1^1 “ * n, 

l,j«l iJ 1 J 1 Pi Pk 1 ^ 

We have seen, (§5.22), that if x ,x„,...,x, , are random variables having distribution (e) 

^ ' 2 

then 2i_ la distributed according to a yr-lQ,vi with k - l degrees of freedom, 

l>j“i ^ ^ ^ 2 

Now If we replace x^^ by (n^-np^)/Vn In (g) denoting the result by , we obtain 

(h) - f (n^-np^) " 

1 np^ 


We conclude that the limiting distribution of Is Identical with the distribution of 
where the x^ are distributed according to (e); that Is to say, the limiting 
distribution of the expression in (h) is the y^-lm with k - 1 degrees of freedom, A rig- 
orous proof of this statement Is beyond the scope of this course, but It la a consequence 
of the following theorem which will be stated without proof: 
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I«t ^ random variables having a .joint c. d. f. 

for each n greater than some n^. Let the limiting joint c. d. f . aa n — ► cd ^ 
P(x^,X2>...»Xj,). Let g(x^,X2,...,x^) ^ a (Borel measurable ) function of x, . . . ,Xp. 
Then the limiting c. d. f. saj P(g) of g(xj’^^,x^“^,...,x^^^) as n — * oo given b^; 

P(8) " ^ dP(x^ ,Xg> . . . >Xj,) , 

R 

where R ^ the region In the x apace for which g(x^,X2>...iXj,) ^ g. 

We may sumnarlze our results therefore in the following theorem, which is, in 
fact, a corollary of Theorem (A): 

Theorem (B): Let 0_ be a saraole of size n from the multinomial distribution 
' ■ ^ ■ n t ; 

p 

as n — » 00 the x ^ distribution 


of j^ (ni-npir 


nPi 


(a)s Then the limiting distribution 

with k - 1 degrees of freedom « 

Now let us consider the contingency problem described in the Introduction of 

510.4. In this. case the multinomial population consists of rs classes (A^Bj) 1 - i,?,..r; 

•j « l,2,..,s. Let the probability associated with (A^Bj) be p^ j(2I.Pi j“i ) • It follows 

i , j 

at once by Theorem (B) that 
(1) 


^(nij-npij)2/npij 


has as its limiting distribution for n — # oo, the 7^ -law with rs ~ 1 degrees of freedom. 

If the were known a priori, then the test given in (1) could be used for testing the 
hypothesis that the sample originated from a multinomial distribution having these values' 
of p^j. If the A and B classifications are Independent In the probability sense then 
Plj - Pj^Qj (X Pj[“i ^ ^ )• II* Pi Qj were known a priori then (1) with 

“ PlQj can, of course, be used to test the hypothesis that the sample came from a multi- 
nomial population with probabilities P^Qj* 

But suppose neither the p^ nor are known a priori, and that we wish merely to 
teat the hypothesis of Independence of the A and B classifications. Karl Pearson proposed 
the following teat for this hypothesis 


(j) 


where the and n j 
If we let 


Xc 


(n. 




Ij n 


1,J 

are defined In §lo.4l. 


)V 


n 

n 


n^.,- np^q, 

If VIT 



and express the n^j In ( j) In terms the we obtain 




where x. 


^x^jandx^j-Ixij. 

By following an argument similar to that used In determining the limiting dis- 


tribution (e) of the x^, 1 ■- we may find the limiting distribution of the x^j 

(all l,j except l-r, j*»s) to be normal multivariate. From this limiting distribution one 


finds that the distribution of A^(x^j-x^ Qj-x jP^) /Pj^Qj is the 7 ^' distribution with 
(r- 1 )(s- 1 ) degrees of freedom. By an argument similar to that embodied in Theorem (A) we 
may make the following statement: 

Theorem (C) ; Let 0^^ ^ a sample from a multinomial population with the “ mutually 
exclusive c lasses (A^^Bj) 1« i,2,...,r; j« 1 , 2 , ..., 3 , lii which the probability assoc - 
iated with (A^Bj) ^ ^l^j* Xq ^ defined as in (j). Then the limiting distribution 
Xo — ^ — — x^" ^^9trlbution with (r-l )(3-l ) degrees of freedom . 

The reader may verify that the likelihood ratio criterion for testing the hypo- 


thesis specified by 


n: p. . > 0, 


-Pi 1 ^ 


c.: Pij = P^qj, ^^Pi = = 1, 

that is, the hypothesis that the A and B classifications are independent is given by 

n"(nn"ij) 

* ■ 

(TTn^-)(nn''-,J) 

1 j -J 

It follows from Theorem (A), §7.2, that when the hypothesis of Independence is true, the 
limiting distribution of -2 log A is the )^^-dlstrlbution with (r-l )( 3 -l ) degrees of free- 


In a mass production process, suppose articles are produced in lots of N arti- 
cles each, and suppose each article, upon inspection, can be classified as defective or 
non-defective. It is often uneconomical to carry out a program of 10054 inspection. As 
an alternative, sampling methods of Inspection applicable to each lot have been developed 
which have the property of guaranteeing that the percentage of defectives remaining 
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after applying the sampling inspection procedure lu t^ long run (!• e. to a large nuniber 
of lota) la not more than some preaaalgned value. Such sampling methods have been devel- 
oped euid put Into operation by Dodge and Romlg* of the Bell Telephone Laboratories. It 
should be pointed out that these sampling methods are essentially screening devices for 
reducing defectives after production, and are not devices for removing the causes of de- 
fectives. Methods for detecting the existence of causes of such defectives must be Intro- 
duced further back Into the production operatlonis. In particular, statistical quality 
control methods originally introduced by Shewhart, have been found useful In connection 
with this problem. 

The mathematical problem Involved In sampling inspection is one In combinatorial 
statistics. Dodge and Romlg have developed two types of Inspection sampling, single 
sampling and double sampling, which will be considered In turn. Prom a mathematical 
point of view, many sampling Inspection schemes c€Ln be devised which guarantee quality of 
outgoing products in the sense mentioned above. 

10.^1 Single Sampling Inspection 

Let p be the fraction of defectives In a lot 1^ of size N. The niomber of de- 
fectives will be pN. Now let a sample 0^^ of size n be drawn from Lj^. Giving all possible 
samples of size n equal weight, the probability of obtaining m defectives (and n - m non- 
defectives or conforming articles) In 0^ is 


(a) 


m,n;pN,N 


.N-pNx .pN 
^n-m ' ’ ^m 

N 
n 


n^«0,i,2,...,r 


where r is the smaller of n and Np. Let 


(b) 


F(c;p,N,n) - Pr(n^c) - Z:J^^,n;pN,n. 


It is easy to verify that If any two values of p and p* (pN and p*N being Inte- 
gers) are such that p < p’ then 


(c) 


F(c;p,N,n) > F(c;pSN,n). 


*H. F. Dodge and H. G. Romlg ”A Method of Sampling Inspection", Bell System Technical Jour- 
nal , Vol. VIII ( 1929 ) and "Single Sampling and Double Sampling Inspection Tables?, Bell 
System Technical Journal, Vol. XX (19^1). 

See •'Guide for Quality Control and Control Chart Method of Analyzing Data" ( 19 ^ 1 ) and 
"Control Chart Method of Controlling Quality During Production" (19^2), American Stand- 
ards Association, New York. 
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Let be the lot tolerance fraction defective, 1. e. the maximum allowable fraction de- 
fective In a lot, which Is arbitrarily chosen In advance (e. g., .01 or .0?). Let 

- P(c;p^,N,n). 

Pq Is known as the consianer's risk ; It Is (approximately) the probability that a lot with 
lot tolerance fraction defective p^ will be accepted without loox inspection. It follows 
from (c) that If the lot fraction defective p exceeds p^ then the probability of accepting 
such a lot on basis of the sample Is less than the coneumer's risk. The probability of 
subjecting a lot with fraction defective actually equal to p ( process average) to 100* 
Inspection Is 

(d) Pp - 1 - p(c;p,N,n), 

which Is called producer ' s risk . It will be noted from (c) that the smaller the value of 
p, the smaller will be the producer's risk. 

The reader should observe that producer risk and consumer risk are hl^ly anal- 
ogous to Type I and Type II errors, respectively, (see 57.3 ) In the theory of testing 
statistical hypotheses as developed by Neyman and Pearson. In fact, historically speaking 
the concept of producer and consumer risks In sampling inspection may be considered as the 
forerunner of the concept of Type I and Type II errors In the theory of testing statisti- 
cal hypotheses. 

Now suppose we make the following rules of action with reference to a sampled 
lot where c Is chosen for given values of Pq, p^, N, n: 

( 1 ) Inspect a sample of n articles . 

( 2 ) If the number of defectives In the sample does not exceed c, accept the lot. 

( 3 ) If the nianber of defectives In the saiqple exceeds d, lixspect the remainder of 

the lot. 

(i») Replace all defectives found by conforalng articles. 

Now let us consider the problem of determining the mean value of the fraction 
defectives remaining In a lot having fraction defective «■ p, erfter applying rules of 
action { 1 ) to ( U ) . 

The probability of obtaining m defectives In a sample of size n Is given by (a). 
If these m defectives are replaced by conforming articles and the sample Is returned to 
the lot, the lot will contain pN - m defectives. Hence the probability of accepting a 
lot with pN - m defectives Is given by (a), m «• o,l,2,...,c. The probability of Inspect- 
ing the lot 100 * la 1 - P(c;p,N,n), which, of coxu?8e. Is the probability of accepting a 


lot with no defectives. Therefore the mean value of the fraction of defectives remaining 
after applying rules ( 1 ) to ( ^ ) is 



The statistical interpretation of (e) is as follows: If a large number of lots 
each with fraction defective p are inspected according to rules (j^) to (UJ, then the 
average fraction defective in all of these lots after inspection is p. For given values 
of c, n, and N, p is a function of p, defined for those values of p for which Np la an 
Integer, which has a maximum with respect to p. Denoting this maximum by it la 

called average outgoing quality limit . It can be shown that the larger the value of p 
beyond the value maximizing p, the smaller will be the value of p. The reason for this, 
of course, la that the greater the value of p, the greater the probability that each lot 
will have to be Inspected I00?tf. If the consumer riak,n, and N are chosen in advance, 
then, of course, c and hence ^ la determined. Thus, we are able to make the following 
statistical interpretation of these results: 

If rules (J_), {2), Q) and (Jf ) are followed for lot after lot and for given 
values of c , n, N, the average fraction defective per lot after inspection never exceeds 
Patter what fractions defective exist in the lots before the inspection . 

It is clear that there are various combinations of values of c and n, each 
having a p with maximum ^ (approximately) with respect to p. 

The mean value of the number of articles inspected per lot for lots having frac- 
tion defective p la given by 


(f) 


I - n + (N-n)(i-P(c;p,N,n)), 


since n (the number in the sample) will be Inspected in every lot and N - n (the remainder 
in the lot) will be inspected if the number of defectives in the sample exceeds c. 

Thus, we have two methods of specifying consumer protection ; (1) Lot quality 
protection obtained by specifying lot tolerance fraction defective p^ and consumer *s risk 
Pq; (11) Average quality protection in which average outgoing quality limit ^ is 
specified. 

By considering the various combinations of values of c and n corresponding to a 
given consumer *s risk (or to a given average outgoing quality limit) there is. in general, 
a unique combination, for a given p and N, for which I is smaller than for any other. 

Such a combination of values of n and c together with a value of p as near to its actual 
value p in the incoming lots as one can "obtain" is, from a practical point of view, the 
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combination to use since aznount of Inspection Is reduced to a minimum. 

Extensive tabulations of pairs of values of c and n, for consumer’s risk - o.io, 
for values of N from l to 1 00,000, for lot tolerance fraction defective from .005 to . 10 , 
and for process average from .00005 to .05# all of the variables broken down Into suitable 
groupings, nave been prepared by Dodge and Romlg. They have also made tabulations of pairs 
of values of and n for given values of outgoing quality limit pj^ from .001 to .10, for 
values of N from 1 to 100,000 and for values of process average from .00002 to .10. Num- 
erous approximations have been made to fonnulas (a), (b), (d), (e) and (f ) for computa- 
tion purposes, which the reader may refer to in the papers cited. For example, it Is easy 
to verify that the Poisson law e*^^{pn)®/ml la a good approximation to (a) if p and ^ are 
both small, say <0.10. 

10.52 Double Sampling Inspection 

In double sampling Inspection from a given lot of size N, the procedure for 
taking action regarding a given lot is as follows: 

(1 ) A first sample of size n^ la drawn from the lot. 

( 2 ) If the number of defectives la ^ c^, the lot la accepted without further samp- 
ling. 

( 3 ) If the number of defectives In the first sample exceeds c^ Inspect the remainder 
of the lot. 

( 4 ) If the number of defectives In the first sample exceeds c^ but not c^. Inspect 
a second sample of pieces. 

( 5 ) If the total number of defectives In both samples does not exceed Cg, accept the 
lot. 

( 6 ) If the total number of defectives In both samples exceeds Cg, Inspect the 
remainder of the lot. 

( 7 ) Replace all defectives found by conforming articles. 

As In the case of single sampling, we have two kinds of consumer protection: 

( 1 ) Lot quality protection, and ( 11 ) Average quality protection. 

Consumer risk, the probability of accepting a lot with fraction defective p^ 
without 100X Inspection, Is given by 

The single sum in this formula is slniply the probability of accepting the lot on basis of 
the first sample ( 1 . e. Step ( 2 )) and the double sum la the probability of accepting the 
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lot on basis of the first and second samples combined (1. e. Step (5)), after having 
failed to accept on basis of the first sample alone. 

The mean value of the fraction of defectives per lot remaining after the defec- 
tives have been removed by the double sampling procedure, for lots having fraction de- 
fective p orlglhally. Is given by 


(b) 



;pN,N 


°2'°1 Np-(C^+1-Hn)^^^ 

^ S S ^ N ^^^,+l,n,;pN,N)(^m.n2;pN-C,-l,N-n,)* 


The mean value of the number of articles inspected per lot for lots having 
fraction defective p la 


(c) 


n- + n, 


m) + (N-n 
^ m,n^;pN,N' 


“n2)(l-Pg^), 


where la the value of the probability given In (a) with p^ replaced by p. 

For given values of N.| , n^ , n^, c^ , c^. It la clear that p la a function of p, 
defined for thoae values of p for which Np Is an Integer, and has a maximum value 
the average outgoing quality limit. For a given value of N there are nany values of n^, 
Ug, , C '2 which will yield the same value of pj^ (approximately), or will yield the same 
consumer risk (approximately) for a given lot tolerance fraction defective. Dodge and 
Romlg have arbitrarily chosen as the basis for the relationship between n*a and the c*8 
the following rule: To determine and n^ such that for given values of c^ and c^, 
n.j and c.| (as sample size and allowable defect number) provides the same consumer risk 
(approximately) as n^ + n^ and (as sample size and allowable defect number). The sense 
In which ’’approximately*’ Is used Is due to nearest Integer restrictions. Even after this 
restriction there Is enough choice left for combinations of n^ , c^, to minimize I 

as given by (c). To determine the n*3 e^d c’s under these conditions for given N, for 
given consumer risk, (or average outgoing quality) involves a considerable amount of com- 
putation. Dodge emd Romlg have prepared tables for double sampling analogous to those de- 
scribed at the end of § 10.51 for single sampling. 

For a given amount of consumer protection, a smaller average amount of Inspec- 
tion la required under doubling sampling than under single sampling, particularly for 
large lots and low process average fraction defective p. 
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AN INTRODUCTION TO MOLTIVARIATE STATISTICAL ANALYSIS 


A considerable amoimt of work has been done In recent years In the theory of 
sampling from normal multivariate populations and In the theory of testing statistical 
hypotheses relating to noimal multivariate distributions. The two basic distribution 
functions underlying all of this work are the sample mean distribution (e) In and 

the Wlshart distribution, (k) In 55.6, of the second order sample moments. We have given 
a derivation of the distribution' of means (55.12) and a derivation of the Wlshart distri- 
bution for the case of samples from a bivariate normal population, (55.12). The general 
Wlshart distribution was given In 55.5, without proof. 

In the present chapter we shall present a geometric derivation* of the Wlshart 
distribution, and consider applications of this distribution In deriving sai^pllng distri- 
butions of several multivariate statistical functions and test criteria. The few sections 
which follow must be considered merely as an introduction to normal multivariate statis- 
tical theory. The reader Interested in further material In this field Is referred to the 
Bibliography for supplementary reading. 

11.1 The Wlshart Distribution 

In 55.5, we presented a derivation of the joint distribution of the second order 
moments In samples from a bivariate distribution. The general Wlshart distribution was 
stated In (k) of 55.5. We shall now present a derivation of this distribution. 

Let 0^: (x,,,x 

n observations from the k- variate population having a p. d. f . 


n' '•"! 1 '■^gi * * * * **kl ^ * ’*k2^ * ‘ ' ^in**2n^ * * ' *^^kn^ ® sample of 


(a) 


vx - 


^ijVj 


*JolTn Wlshart, "The (jenerallzed Product Moment Distribution In Samples from a Normal Multi- 
variate Population", Blometrl ka . Vol. 20A, pp. 32-52. A proof based on the method of 
characteristic functions has also been given by J. Wlshart and M. S. Bartlett, "The (3en- 
erallzed Product Moment Distribution", Proc . Camb . Phil . 3oc . . Vol. 29 (1953) PP. 260- 
270 . 




irtwre A Is the determinant of the positive definite matrix t |A< I . Let 




(1>J " 


Clearly - bjj^, so that there are only k(k+i )/2 distinct bj^j. The may be re- 

ferred to as second order sample moments » Our problem Is to obtain the Joint p. d. f . of 
theb^j. The Joint p. d. f. of the (1 - i,s,...,kj a- 1 , 2 ,.. .,n) Is given by 






'^IJ^IJ 


Now, the probability element of the b^j la given by 


j+dbj^ 1^ j*l , 2, • • e ,k ) ■■ 




where R la the region In the kn^dlmenalonal apace of the for which 


< I,”!*’)* 


< bj^j + dbj^j, 


(l^j “ i}2f<»i^k)i 


within terms of order T| db< ,, the probability given by (d) may be written as 

l^jil 




Our problem now reduces to the Integration of TT^i^la ^^Ston R. Let f^Cb, , )<ih^, 

o 1 

bo the volume element for which b^-^ < 2^ ^11 ■*■ ^2^^21 '^22^^! l ^‘^^2l'®^22 

volume element for which ^ ^ca^ia ^ ^21 *^^21' ^ ® fixed value of b^,; 

with a similar meaning for f ^(bj, jb^gjbjjlb^ , ,bg, ,b 2 g)dbj^dbjgdbjj, and so on. Then the 


volume element for which b^ j < 


^^la*Ja 


terms of order ‘^^ij) given by the product 


< bj^j + ‘^^IJ' ^®' ^^® integral In (f ) (to 


(g) ^*^^21^^22* ••*^k^\l '^ 2 ' ***'^^'^ 1 1*^21 '^22'* *'^-1 k-1 ^ 


Now, conalder the problem of determining the expreaalon for 


(h) 


^m^\i »'’m2' • • 1 ' * * ’'^m-l ,m-l • •‘^^mm* 





We note that ^ are fixed. Geometrically, we may repre- 
sent ,x^ 2 , . . .,x^), i - l,2,...,k, as k points in an n- dimensional space. Vb^^ 

is the distance between the 1-th point and the origin 0, while b^j/ Vb^i^^bj j is the 
cosine of the angle between the vectors OP^ and OPj. Fixing b^j, l^j - 
means fixing the relative position of the vectors OP^ , OP^, . . . . The vector OP^^j is 

free to vary in such a way that 


V < -^Wia < V ^ 


(1 » 


and we wish to find the volume of the region over which P^j^ is free to vary. If n - m, 
we have as many vectors as dimensions and we can find our volume element by making the 
t ransf ormat ion 


-Wla 


(1 - i,2,...,m). 


The Jacobian is 


^ 


where 


^11 *12 
*S1 *22 


2*m1 2^n2. 


(i,j =" i,2,...,m). 


The abaolute value of the determinant' la the volume of the parallelotope baaed on 
the edges OP,, 0P2,...,0Pju. By taking the positive square root of the square of this 
detennlnant, we may overcome the difficulty of sign. Thus 


and hence 


Therefore, we have 


a - 2 Vlbj^jl . 


«-l 2 V lb, ,1 1^-1 


Hence the differential element on the right in (k) obtained by taking all values of Xj^^ 


for which 


^ml ^ ^*ino(*la < ^ml * 


Is a function of the volume of the parallelotope and the differentials db^j^^ in the values 
of the h^.y 

It can be shown* that is the volume of the parallelotope based on the 

edges OP^ ^OPg, . • for any number of dimensions n ^ m. If n exceeds m, then P^ is 

free to vary within an (n-m+i )-dimenalonal spherical shell, as will be noted by examining 
the inequalities in (1). One of these inequalities (l«-m) represents an n-dlmenslonal 
spherical shell of thickness db^, the remaining inequalities representing pairs of paral- 
lel (n-1 )-dlmenalonal planes, where in general no two pairs are parallel to each other. 
The volume Included between any arbitrary pair of planes, e. g., ^ 

^ ^mg^la "* ^ml ^^ral ^ m-dlmenslonal slab of thickness inter- 

section of the (m-1) pairs of (n-l ) -dimensional planes and the n-dimenslonal spherical 
shell yields an (n-m+i )-dimenslonal spherical shell. Now the inner surface of this shell 
(or any spherical surface concentric with the inner surface) is perpendicular to the 
differentials db^, db^^ m-i > • • evident upon examining the manner in which 

the (n-m+1 )-dlmenslonal spherical shell mentioned above is obtained as the conmon inter- 
section of the m-1 parallel pairs of (n-1 ) -dimensional planes and the n-dlmensional 


For example, see D. M. Y. Soramervllle An Introduction to the Geometry of n Dimensions , 
Methuen, London ( 1929 ) Chapter 8. 

There is also another geometrical interpretation of for any n ^ m, which is 

of considerable Interest. The x^^ (1 « i,2,...,m; a« i,2,...,n) may be regarded as n 
points ® l,2,...,n) in m dimensions. If we take any m of these n points, say 

P : (x.^ ,i = l,2,...,m),(r - i,2,...,m) together with the origin 0 as the (m+1 )-st 
^r ^^r 

point, then the square of the volume of the parallelotope baaed on OP^ ,OPq^ ,...,0P^^ 
m 12m 

is given by Xi |. This follows from the discussion between (i) and (j). Now 

r^i ^ r ^ r 

there are ways of choosing m points from the n points Pq^, and hence there are 
parallelotopes which can be formed in a manner similar to that discussed above. It can 
be shown that - ^'| I, where denoted aunmatlon for all paral- 

lelotopea thus formed. The proof of this follows by mathematical induction by Increasing 
■the number of points from m to n successively by unity. In the case 1 •• j - 1 , we have n 
points in one dimension and Ibj^jl =. b^^ = the stun of squares of the distances of 

each point from the origin. In the case oPl,j - l,2,...,k, la the svun of squares 

of the volumes of all m-dlmenslonal parallelotopes which can be constructed from the 

n given points, using the origin as one vertex In each parallelotope. t*® 

referred to as the general 1 zed sum of squares. 
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spherical shell. Therefore, the rectangular volume element dbju^dbjug, . . .jdbjjm Is perpen- 
dicular to the Inner surface of the (n-m+i )-dlmen8lonal spherical shell. The thickness 
of this shell Is given by the differential element 1/(2 In (k). Therefore, 

by multiplying this thickness by the inner surface content of our (n-m+1 )-dlmenslonal 
shell, we obtain the volume of the shell to terms of order . The radius of this 

inner surface Is equal to the distance h from to the (m-1 )-dlmenslonal space formed by 
OP^ . This Is, perhaps, seen most readily by noting that the Inner surface 

of our shell Is obtained by taking the Intersections of ^ " ^ml^ ^ 

(we are assuming, of course, that all db^^^^ are >0). The cente? of the sphere having 
this surface must clearly lie In the Intersection of the (m-1) (n-1 )-dlmenslonal planes 
iwia- \l» i ,ra-i . This Intersection point lies on each of the vectors 

OP, , 0 P 2 ,..., 0 Pju_^,and the line between this point and P,^ Is perpendicular to each of the 
first m-1 vectors, which Is equivalent to the statement that the center of the (n-m+1 )- 
dimensional shell Is at the point where the perpendicular from intersects the (m-1)- 
dlmenslonal space formed by the remaining m-1 vectors, OP, ,0P2, . . .,0P^,_, . The volume 
of Tj„, the parallelotope formed from OP, ,0Pg, . . . Is Vlbj^^l - V,„, say, and that of 

Tm- 1 , the parallelotope formed from OP, , 0 P 2 , . . .,0Pjp_, , la V| b^ I - a,p - l,2,...,m-1. 

Using T^,_, as the base of T^^^ and h as the height, we must have •• hV^j_,, or h •• 

V,-.- 

Now the volume of an n- dimensional sphere of radius r is 


(1) 


r Vr^-x^ Vr^-x®-...-xf , 


0 0 


n 

dxndx,,-i...dxi . 


and the surface content of the sphere is obtained by taking the derivative of this ex- 
pression with respect to r, which is found to be 

n 

(m) 


2Tt^I^' 


1-1 


r(|) * 

The Integral In (1) may be readily evaluated by Integrating Imnedlately with 
respect to x^, then setting 

Xn.i - V^Vr2-x?-...-x^.l-l, 

1 » 1,2,..., n-1, and Integrating with respect to the appropriate e at each stage. 


The surface content of the Inner surface of our spherical shell Is therefore 
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n-m+i 


(n) 


2Tr 


n 


n-m+l 


/ in ^n-m 


) *111-1 


and the content of the spherical shell Is obtained by multiplying expression (n) by the 

thickness db^^ . Therefore, we finally obtain as the expression for the function 

m l»i 

In (h), 

n-m-i-1 
(o) 


rii-rn-l 


^2 v;" *“ ■ m 

n, 

rn-m V ; ml 


H/n-m+l X ,rn-m J ; 

' (-T-) Vi 


Letting m take on the values in (o) and multiplying the results, we obtain the 

0 

following expression for (g) 


(P) 


ft 


kn . k(k-l.J 
2 h ^,,n-k- 1 


V 


?fn 


1=1 




- — 

i-ii 


which la the value of i n ^loL terms of order n dbij. We therefore finally 


R i,a 

obtain the Wlshart distribution: 




n n-k- 1 




(A/a'^f 1^1 


l^j ki^- U or’( n4l-l ^ 


i A. ,b, . 

^l.>l 






which Is defined over the region in the b^j apace for which | Ib^jl ' Is positive aeml- 
deflnlte, that la, over all values of the a^j for which Ib^jl and all principal minors of 
all orders are ^ o. In order for the distribution to exist it is clear that n+l-k )> 0. 
Since j~r 

where the Integration la taken over the space of the b^j. It la clear that 

k 


(r) 


Ilbijl ^ e 




TTdb. , 


k(k-i ) k „ . . 

“V^TTRSii^) 

tr i-i 


(A/2^) 


Replacing by A^j - 26^^ - Sj^^) In (r) then multiplying the result by 


n 






2 





2 


we obtain the m. g, f. of the b^^ and 2b^j (1<J)# which has the value 


(3) ^ . 

^ ? n 

Similarly, the reader may verify that the m, g. f. of the and 2^x^^Xj^(i<j) aa 

determined from (c) by multiplying (c) by 

«■*•> J ' 


and integrating over the entire kn-dimenelonal apace of the x'a ia alao given by (a). 
Therefore, if one were given the function (q) in advance, one could argue by the multi- 
variate analogue of Theorem (B), § 2 . 81 , that it la the dlatrlbutlon function of the 

^^® P* 0^ the la given by (c). 

The Wlahart dlatrlbutlon (q) may be regarded aa a generalization of the 
distribution to the case of vectors with k-coraponents. In fact for k = 1, the quantity 
A^^b^^ is distributed according to the Tif-dlstrlbutlon with n degrees of freedom. In this 
case b^^ is the sum of squares of the n sample values of x, , while in the k-varlate case 
la the sum of squares of the n sample values of the Xj^ (the 1-th component of the 
vector x^ ,X 2 , . . .,X]^) and ^IJ (l^^j) is the inner product or bilinear form between the n 
sample values of the x^^ and Xj. Aa In the case of the x^-dlatrlbutlon, the Wlahart dis- 
tribution has a reproductive property to be considered In the next section. 

11.2 Reproductive Property of the Wlahart Distribution. 


The reproductive property of Wlahart distributions la very useful in multivari- 
ate statistical theory, and it may be stated in the following theorem: 

^eo^mj^. 1^ bl 'Ab!^^, . . .,bl?^ (l^j = 1 , 2 , ...,k) ^ p systems of random 


variables distributed independently acco 


to Wlahart distributions (p. d. f . 's) 


respectively . Let b^j = 
the Wishart p. d. f . 


fc -1 




(t “ 


Then the b^j are distributed 


Vk^^lj'^ij^- 

To prove this theorem, we determine the m. g. f. of the.b 


and 2b^j {i<j). We have 
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(1) 5“e >j(2) 


b^P) 

®irij 


). • .E(e 


But E(e-* 


*irij . ^ 


) =• A |Aj^j-2«jj| 


and therefore 


(t)(eij) - A^|Aj^j-2e^j| 


which l3 the m. g. f. for the Wlshart p* d. f. 


which we conclude, hy the multivariate analogue of the Theorem (B), §2.81, to be the dis- 
tribution of the h. 3 biV)> 


ox'xuuoxuxx ux ulio ^ j ;• 

11.3 The Independence of Means and Second Order Moments In Samples from a 


Normal Multivariate Population 

Suppose l»l,2,...,k; a«i,2,...,n) Is a sample from the normal raultlvar- 

late population having p. d. f. 


) 




The p. d. f. of the sample la 






e ^1,^ 


lj"lj 


where = y~^ 






where 



The a^j are dlatrlhuted accordlijg to the Wlshart distribution (q), Jll. 1 , with 
n replaced by n-i . It was shown in 55.12 for the oaae lo-s that the are distributed 
according to the nortnal bivariate law (d), 55 . 12 , and it was remarked that in the general 
case, the distribution of the is given by (e), 55.12. The proof of (e), 55.12, may bo 
carried out by evaluating the m. g. f. of the (x^-a^^), 1 . e. 


‘ ‘ ‘ j,’ ‘ ‘ 


whore Is given by (b), the integration being over the entire kn-dlmenalonal space 

of the The evaluation of this integral may be carried out aa an extension of the 

case of k« 2 , 55 * 12 . The details are left to the reader. In order to show that the a^^j 
have the Wlshart distribution with n replaced by n-l, it is sufficient to show that the 
m. g. f. of the aj^j^ and aa^^j (iT^J) is The problem of doing 

this la a direct extension to the k-varlate case of the procedure followed for k-a in 
§5.5. We shall have to leave the details to the reader. 

Just as in the 1 and a variable cases discussed in §5.6, the a^^j and Xj^ 
are Independently distributed systems. A fairly direct verification of this, although 


tedious, is to evaluate the joint m. g. f. of the a^^j and Xj^ and note that it factors. 


11 . Hotelling's Generalized "Student" Test 


Suppose a sainple 0^^ is drawn from a normal multivariate population with distri- 
bution (a) in § 11 . 5 , and that it is desired to test the hypothesis H(aj^«aj^Q) that the a^^ 
have specified values a^^^ (l-i , 2 , . . .,k), no matter what values the Aj^j may have. This 
hypothesis may be specified as follows: 


Aj^j such that I lAj^jl I is positive definite 
and -00 < a. < +oo, 1 - l, 2 ,...,k. 


The subspace of il for which a^^ - aj^^, 

1 * i,2,...,k. 


It will be noted that this is the k-varlate analogue of the "Student" statistical hypoth- 
esis discussed in § 7.2 for one variable, which is simply the hypothesis that a sample from 
a normal population comes from one having a specified mean, no matter what the variance 
may be. 

The likelihood function for testing the hypothesis H(aj^i<ijQ) is given by (b) 


in 511.5. 



we find 


Maximizing the likelihood function for variations of the and a^^ over II, 


I lAjjl I ■ I I ^ > 


and hence the naxlrami of the likelihood for variations of the pai^aineters over fi la 


-■jnk 


M A S 


similarly, the maximum of the likelihood for variations of the parameters over 
u) (1. e. for variations of the and for found to be 


-ink 




where In (b), 511.5, with a^ - a^^^. 

The likelihood ratio for testing H(aj^-aj^^j) Is the ratio of expression (d) to 
expression (c), 1. e. 


Clearly, we may use •• y, say, as a test criterion for Hlaj^-a^^^) since It la a single- 
value function of A. To complete the derivation of our test, we must determine the dis- 
tribution of Y when la true. We shall obtain this distribution by first finding 

Its momenta. Now, we know from 5ii.i that the joint distribution of the Is the 
Wlshart distribution 


n n-k-1 

^,k «olj^ Ij 




The g-th moment of la obtained in the following way. Since the integral of the 

function (f ) over the space S^. of the c^j^j Is unity, we have 




lii 
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(g) 


S ICoijI / e i'J Ode, ’ 


i<j 


'olj 




Replacing n by n+2g In (g), then multiplying by 


(A/ 2*^)^ 


kik^fjp,. 


w T— 


we obtain an expression on the left which defines EClc^^jl®) and its value is given on the 
ri^^t. That is 


(h) 


-- 


4 n 


. g) 


n+i-i 


But the Cq^j are functions of the eind since 

(1) = a^. + n(x^-aj^)(Xj-aj^.- 


Therefore, 


(j) 


k 1 


n-l 


n^A^ (A/a") 8 


an) 




a TT 


n-k -2 - 45Ia. .(a. g+n(x.-a.J(x .-a O) 






g) 


{A/2k)gr(n±^) ■ 


Dividing both members of ( j) by the expression in [ ], then replacing n by n+2h, except in 
the distribution of the x^ (the n's here being easily removable by changing variables 
Vri(Xi-ai) = y^ sayi then rniiltiplying the resulting equation by [ ], we obtain, as the first 

CT Vl 

member, an expression defining E( and its value is given by the second 

member ; thus 


(k) 


f] + g + h)r(^ + h)] 

E(|c„i.|8|a. .|h) =_] ^ 

(A/a¥^^rT [R^ + h)n^): 


Clearly this moment will exist for all Integers g and h for which all arguments of the 
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£21 


gaiana funotlcoBare > o. Setting g -h, we obtain aa the h-th moment* of Y, 


* h)] nf r(^ + h) 

( 1 ) • 

n * tDRs^) ] Rf + h)Ri^) 

\ 

This moment may be written aa 


(m) 


nf) . hRi) 

R^rtf) Rf.h) 




+ h - 


1 


k 

(i-x)2 


1 

dx. 


Therefore the h-th moment of Y (h « 0,1,2,...) la Identical with the h-th moment of a 
variable x having probability element 


(n) 


nf) 



1 

(1-x) 


k 

2 


1 

dx. 


It follows from Theorem (A), §2. 76, on the \mlqueneas of distributions from moments that 
Y Is distributed according to the probability law (n). 

Making use of the fact that 


°ol j “ ®1J + j-® j(^ 

and letting 

yi - 

we may write 

(o) '°oij' - '®ij yi^j' 


^ y^i y? 

0 a,,+yf a^g+y^yg 

0 ®2l‘*’y2^1 ^ 22*^2 ®2k''’^2^k 

. . . • , 

• • • • . • 

. • • * • • 

° ^i‘'’ykyi ®k2''’yky2 ^kk'^'yk 


*Por more applications of the foregoing technique of finding momenta of ratios of deter- 
minants, see 3. S. Wilks "Certain Generalizations in the Analysis of Variance", 
Blometrlka . Vol. 2 k (1932) PP. l^71-‘^9‘^. 



Multiplying the first row hy -y^ then adding to the second; multiplying the first roir Ijy 
-yg €Lnd adding to the third; and so on, we may write the determinant as 


-1 Yi • 

^12* 


^2 ^21 ^ 22 * 


®k1 ®k2‘ 


It follows from the argument leading to expression (k), 55.25 that the expression (p) may 

be written as substituting the value of y^ we are finally able 

1 , 

to write 

2 

(q) Ic^ijl - - la^^l [1+ 

where Is Hotelling *s* Generalized Student Ratio which can be written down explicitly 
In terms of the a^J and In an obvious way. Hence 

(r) Y i- , 

i+T'^/n-l 

and the dlatrlbutlon of T can be found at once by applying the transfonnatlon (r) to the 
probability element (n) (with x replaced by Y). The result Is 

k-1 

. . gn§) (T^/n-1) ^ dT 

n!^r(^)\SPT (UT=/n-i)“/= ■ 

11.5 The Hypothesis of Equality of M&ans In Multivariate Mormal Populations 
Suppose 0 (Xia» a-l,2,...,nj.; t»i,2,...,p) are p saa^ples from the 

normal k-variate populations 


A" 

rrn 

(arr) ^ 


1 

2 




(t-1,2,...,p). 


and that It Is desired to test the following hypothesis: 


H. Hotelling, "The denerallzatlon of Student's Ratio", Annals of Math . Stat . . Vol, 2 
(1931 ) pp. 359-378. 



such that llAj^jll ia positive definite 
and *00 ^ a< ^ oo f 1*1 ^ 2 ^ » fic f t* l ^2y«»*^p« 


Subspace of il for idilch a j - a® - , 
where -oo < a> < oo , 1-1,2,... ,k. 


aP - aj_, 


Denoting this hypothesis by H(aj^-a|-. . .-aaj). It Is simply the hypothesis that the samples 
come from k-varlate normal populations having identical sets of means, given that they 
come from k-varlato nonnal populations with the same variance- covariance matrix. It 
should be noted that this hypothesis Is the multivariate analogue of that treated In §9.i. 




where 






-t 1 t 




The second-order product moments In the pool of all samples, and similarly 

Is the mean of the 1-th variate In the pool of all samples. 

The likelihood function for all sanqjles Is 


J ■ 


where 


Muclnlzlng the likelihood function for variations of all parameters over XI, we obtain 


aj - xj, - M^lf’ 


and the naximum of the likelihood turns out to be 


n±£ ft n 


Similarly, maximizing the likelihood function for variations of the parameters over oj. 


we obtain 


^i ™ ^1^ ^ ^ ^i j ^ 




and the maximum of the function turns out to be 


^ PL B 
( 2 TT )2 |^|2 


Hence the likelihood ratio for testing H(a|=a|=. • .=a^) is the ratio of (j) to (h), 1. e 


Again we may use « Z, say, as our test criterion. To find the distribution 

of Z, we proceed as in §11.4 by the method of moments. Noting that the a^^j are distri- 
buted according to the Wlshart distribution have, similar to (h), 

511.4, 


E(|a^j|8) = -1 


r\r(^ - g) 




Now it may be verified that 


m . = a^j 4 m^y 

where “ ). Since the a^j are functions only of the a^j, the 

ajj and xj being Independently distributed systems, it follows that the a^j^j and the xj 
are Independently distributed systems. The ajj are distributed according to Wlshart dis- 
tributions ), t»l ,2, . . .,p, and it follows from the reproductive property of 

the Wlshart distribution that the a^^ are distributed according to w^^^ k^^lj^^lj^* 
Therefore by using the joint distribution of the a^j and x^ and following steps similar 
to those yielding (k) in §11.4, we find 





[r(^gfh)r( ^~p | ^~^ +h)] 




n-l.v.xr/n-p-t-1-l, 


The h-th moment of Z is given by aettlng g - -h. We find 


E(Z^) -4 


frir(^)n2=E±2^ rfi» 




It should be noted* that for the case of two samples, (1. e. i>»2), the h-th moment of Z 
reduces to 

(p) M 

na=fir ’ 

and. hence the distribution of Z In this case Is the same as that of Y with n replaced by 
n-1 . In the two-san 5 )le case, it should be remembered that n ® n. -f n^, the sum of the 


two sample numbers, 


For the case of p « 5/ the h-th moment of Z Is 

r(^)r(^r{^+hr(^+h) 


Making use of the formula 


(q) reduces to 


Hi *-ir(i . i)- ipngi ^ ' 


[n-2]| (n-k-i 


P( n- 2+2h )r( n-k- 2 ) 

from idilch wo Infer the distribution of Z to be identical with that of x^, where x is 
distributed according to 


(t) n . nra J x^-l^-3(i-x)I^-idx. 

^ r(n-k-2r(k) 

Setting Z -ix^, we find dx ■« and hence the distribution of Z for the case of 


three samples Is 




(u) 





The distributions for p - 4 and 5 turn out to be relatively simple also 


n . 6 The Hypothesis of Independence of Seta of Variables In a Monnal Multi- 
variate Population 

Suppose 1-1, 2 , a-i, 2 ,...,n) Is a sample from a normal multivariate 

population with distribution (a) In § 11 . 3 . Let the variates be grouped into r groups . 
as follows; G, ; (x^ ,X 2 , . . . ,Xj^^ ), Q^: )/•..» OpJ 

(Xj^ +. +k where k - k,+k 2 +...-»k . The problem we wish to consider Is that 

of deriving a teat for the hypotheala that theae groupa of varlatea are mutually Indepen- 
dent, 1. e. that - 0 for all l,j not belonging to the same group of variates. Let 
I I denote the value of | lAj^jl I when all are 0 for 1 , j belonging to different 

groups of variates. The hyjKjthesls to be tested may then be specified as follows: 

I Space of the Aj^j such that | lAj^jl | Is positive 
(a) j definite and -00 < a^^ < +00. 

(o) 


Subspace of XL for «dilch 1 1 A, 


- lA, 


We denote this hypothesis by H( | lAj^jl I - | I ). Maximizing the likelihood function 

(b) In §11.2 for variations of the parameters overil, we find the maximum to be 


££ ft £ 

(2 |_li|2 


The ntolmum of the likelihood function for variations of the parameters under <jj la 


-^nk 


nk /„ai 
(2rt)2 


where a^j^ - a^^j If l,j belong to the same group of variates and a^j^ - 0 If l,j belong 


to different groupa of variates. Clearly Is equal to the product of r mutually ex 

elusive principal minora T^la. , I, the u-th minor being the detenninant of all a., 

i>»l u^u , . r 

associated with the u-th group of variates. Similarly lAiyl - 11 i I. The llkell- 

, , u-1 U'^U 

hood ratio for testing H{| lAj^jl I - IIA^j II) la, therefore. 


n 

.( o ),2 
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Denoting A by W, which may clearly be used as the teat criterion In place of A, we de- 
termine the distribution of W by the method of moments. 

It should be noted that if we factor out of the 1-th row and 1-th column, 

(1*1 ,2, . . . ,k) of each of the two deteminanta using the fact that 

aij/ Va^^ajj - r^j, the sample correlation coefficient between the 1-th and j-th variates. 


where r^^^ = 1, and rj^j^ • r^j if 1 and j both belong to the same group of variates, and 
^ij^ “ 0 if 1 j belong to different groups of variates. 

To find the momenta of W, let us divide the a^j into two classes; (A) those 
for which 1 and j correspond to different groups of variates, and (B) all others. Let 
the product of differentials of the in (A) be with a similar meaning for dVg. 
Now it is evident that if we integrate the Wishart distribution ) with re- 

spect to all a^j in Class (A), we will obtain the product of Wishart distributions 


u*«l ^ U U'^U U'^U 


since this integration simply yields the joint distribution of the a^j in Class (B) which 

we know to be Independently distributed in sets a, . (u=*i , 2, . . . ,r ) when I |A^ .| I « 

. . ^u^u 

I lAi^M I, each set being distributed according to a Wishart law. Hence we must have 


n-1 

(n-1 )k k(k-l) j 

" TT " rr(^) 


n-k-2 1 


(o) 

A. ♦ a^ • 


2 “Ij 


n-ku-2 


(n-l)k^ 




uJu ^'^U 




X' 

Let both members of (e) be multiplied by n |a^ 1 1'^ (which is constant as far 

u«i u*^u 

as the Class (A) are concerned), then replace n by n + 2 h throughout (e), then 

multiply throughout by 

nn^ii + h) 2^ 


iA<5)|hTT(Sii) 



then Integrate with respect to the In (B). It will be seen that the first member In 
(e) after these operations will be the Integral expression defining E(W^), and the second 
member will be the value of E(W^). We find 


. r fn^) 1 . IcH^+h) 

E(w^) “ n n p.n-f -- n 

u=l 1=1 I {—+h) 1=1 I (— ) 


As a special case, suppose we wish to test the hypothesis that Is Independent 
of the set x^,x^, * . . In this case r = 2, =* i , = k-l . The W criterion Is 


1 - R", 


where r^^ Is the minor of the element In the first row and column of R l3 Ihe 
sample multiple correlation coefficient between and x^^x^, • . . ,x^. The h-th moment of 
W for this case Is found from (g) to be 

Following the procedure used In Inferring the distribution of Y in §11,4 from Its h-th 
moment, we find the probability element of W to be 


p/H-i ' n-k-g k-3 


Setting W - 1-R^, we easily find the distribution law of R^, the square of the sample 
multiple correlation coefficient, between x^ and x^,x^,...,x^ to be 


p/ n-1 X k~3 n-k-2 

(R^) g (,-Rg) 2 d(R2), 


when the hypothesis of Independence of and Xg,Xj, . . .,Xj^ Is trae, 1. e. when the 
( j“2,3, . . . ,k), which Is equivalent to having the multiple correlation coefficient equal to 
zero In the population. This result was first obtained by R. A. Fisher, who also 


later* derived the distribution of in samples from a normal multivariate population 
having an arbitrary multiple correlation coefficient. 

Distributions of W for various special cases involving two and three groups of 
variates have been given by Wilks**. 

11.7 Linear Regression Theory in Normal Multivariate Populations 
The theorems and other results presented in Chapters VIII and IX can be extended 
to the case in which the dependent variable y is a vector with an arbitrary number of 
components (say y^ ^yg, . . .,yg), each component being distributed normally about a linear 
ftinction of the fixed variates In this section we shall state without 

proof the multivariate analogues of the Iniportant theorems In Chapter VIII. The details 
of the proofs of these theorems are rather tedious and can be carried out as exten- 
sions of the proofs for the case of one variable. 

Suppose y^ ^yg# • • *#71^ distributed according to the nonnal multivariate dis- 
tribution 

(a) — ^ e , 


where 

(to) 



the Xp being fixed variates. Let Oj^ (yia'^a* »a» • • > 2/ • • • a-i,2,...,n) be 

*R. A. Fisher, "The General Sampling Distribution of the Multiple Correlation Coefficient" 
Proc . Roy . Soc. London. Vol. 121 (1928) pp. 65l^-673. 

An alternative derivation has also been given by S. 3. Wilks, "On the Saiqpllng Distribu- 
tion of the Multiple Correlation Coefficient", Annals Math . Stat . . Vol. j (1932) 
pp. 196 - 203 . 

**S. S. Wilks, "On the Independence of k Sets of Normally Distributed Statistical Vari- 
ables", Econometrloa . Vol. 3 (1935), pp. 309 - 326 . 

***Proofs and extensions of many of the results may be found In one or more of the follow- 
ing papers: 

M. S. Bartlett, "On the Theory of Statistical Regression", Proc . Royal Soc . Bdlriburtai . 
Vol. 53 ( 1933 ), pp. 260 - 283 . 

P. L. Hsu, "On Generalized Analysis of Variance", Blometrlka . Vol. 31 ( 19 *^ 0 ), pp. 22 I- 

237 . 

D. N. iBwley, "A Generalization of Fisher's z", Blometrlka . Vol. 30 ( 1938 ), pp. 180 -I 87 . 
W. 0. Madoiw, "Contributions to The Theory of Multivariate Statistical Analysis", Trans . 
Amer . Math . Soc .. Vol. Uit ( 1938 ), pp. 45U-»»95. 

3. S. Wilks, "Moment-Generating Operators for Determinants of Product Moments In San^le: 
f rom a' Noimal System", Annals of Math . . Vol. 35 (1934), pp. 312-340. 
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a sample from a population having distribution (a). l!he likelihood function associated 


with this sample Is 




where 








Clearly Cpq - c^. For a given value of 1, let a^^p be the solution 

of the equations 


that Is, 


and let 


Furthermore, let 


°iq - g^lp'^pq - 0' 






^la “ i ®lp*pa- 




((^1 ,2,>.i,k). 


The essential functional and probability properties of the quantities defined 
In (d), (e), (f), (g), (h) and (1) may be stated In the following theorems: 


TheoTCTi^^^A^: 

(J) 




(1-f e • • PyC^I • e • ^k) s 






: If Qj^: la a aample from a ixapulatlon having distribution 

(a), then If the Xp^ are auch that I Ic^l I IS noaltlve definite . the Sj^j are distributed 


to the Wlahart dlatrlbutlon 


"n-k.a^^lj'^lj^' 


and Independently of the a^^p (l-l ,2, . . .,a; p-l , 2 , . . .,1c) which are distributed according 
to the normal Ica -varlate distribution law 

( 1 ) 

)C3 

TfS 

Ic s 

where D is the ka -order detennliiant and has the value A *1 Cp^l . 

The multivariate analogue of the general linear regression hypothesis stated In 
58,5 may be specified as follows: 


Jlhe space for whlchlA^jIla positive definite 
and -00 < a^p < oo , l«l,2,.,.,a; p-i,2,...,k. 

The subapace of -fL for which a^p - a^p^, 

]l-i ,2, • • ,,8; p-r+i,..*,k. 


Let us denote this hypothesis by H(a^p«a^p^). It Is the hypothesis that the last k-r 
regression coefficients corresponding to y^i^ (l«i ,2, . . . ,s ) have specified values a^p^. 

If the a^p^ » 0, our hypothesis Is that each y^ Is Independent of 

The likelihood ratio A for testing this hypothesis (as obtained by maximizing 
the likelihood (c) for variations of the parameters over Hand by maximizing for varia- 
tions of the parameters over u> and taking the ratio of the two maxima) turns out to be 
given by W®, where 


(n) 



The form of ajj may be seen from the following considerations: 

In view of Theorem (A), when the likelihood function (c) Is maximized for vari- 
ations of the parameters overcj, we may consider the maximizing process In two steps: 
First, with respect to the a^p i)arameters over cj (holding the fixed). Here we fix 
a^p » (1-1 ,2, . . .,8; p»r+l,...,k) In (j) and minimize the second term on the rlghit 

side of (j) with respect to a^p (l«i ,2, . . . , 3 ; ,2,...,r). The coefficient of A^j In the 



rlgbt hand side of ( j) after this minimizing step Is s|[j, idiere 


(o) sjj - s^j + m^j, 

where m^^j results from the second tens of the rl^t hand side of (J). We next maximize 
(o) for variations of the A^j after maximizing with respect to the a^p (overut). It 
will he seen that the maximizing values obtainable after the first nmxl- 

mlzlng step (1. e. with respect to the a^p overw), and are given by 

iiA^jii - ii-^ir’. 

It will be noted that the fonu of sjj Is similar to that of s^j, and Is given 


•A •A 


“ Z(yia-^!a)(yja-V 


idiere 


Tia - Ha ■ i,“lpoV h. • 1,‘IpV 


and irtiere ajp are given solving the equations 


where 


- > a»_cj. - 0, 

iq ip Pq ' 


“ > 74-X_,. 

iq ^'la qa 


((^1,2, ...,r), 


The In (o) are functions of the a^^p which are distributed Independently of the s^j. 
In fact. It can be shown that the m^j are of the form whore the 

(l>l,2,...,s) are linear functions of the a^p distributed according to 


and furthermore the sets (u«i,2,,..,k-r) are Independently distributed, and are dis- 
tributed Independently of the s^^j whei« ®^®lp“®lpo^ true. If the a^^pp - o (l>l,2,..,s 
p>r-fi,.,.,k) then It follows from Theorem (B) that may be expressed as the ratio 


of two detenidnants as follows: 



< 11.7 


AW. 
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(t) 


'"!j' 


®ij 

®iq* 

®P’J 

®p*q* 


®p'q’* • 


(If J"1 f 2f • • • f 8f 


Now the problem of determining the distribution of U when H(®ip"^ipo) true, 
la, therefore, reduced to that of determining the distribution of the ratio of detennln- 
ants 


(u) 


Is 


li! 


I^lj + 


where the s^j are distributed according to the Wishart distribution 


(V) 




and the are distributed according to 


(w) 


(271) 


A - ■ S 

TFTTj- ® 

2 


k-r 

2 


the 3 ^j and being Independently distributed systems. 

The simplest procedure for finding the distribution of U la perhaps by the 
method of moments. The method of finding the moments of U is entirely similar to that of 
finding the momenta of Y eind Z in §n.U and §11.5, respectively. The h-th moment la 
given by 


(X) 


E(U^) = 








from which one may Infer the distribution of U In any given case by methods Illustrated 
In §11.5. We may summarize our remarks In the following theorem which la the multivariate 
analogue of Theorem (A), §8.5- 

Theorem (D) ; Let O^Cyia^^pa^ — - s^niple of size n from the population having 
distribution (c). Let H(a^p«a^pQ) ^ tto statistical hypothesis specified by (m), and 
let U » A^/^, where A the likelihood ratio for testing the hypothesis . Then 


(y) 




( 1 , 1 , 2 , ..., 8 ) 




where ^ defined ^ (i), and ^ (o) and (g), and if H(®ip“®ipo) !§. true , the h-th 
moment of U la given by (x). 

It should be observed that U la a generalized form of the ratio 



n<r^ + n(a2.ff^) 


In Theorem (A), § 8 . 3 . In fact, when a - 1 , then a^, - no^and m^, - n(o-^-o-^). 

It may be verified that Theorem (D) is general enon^ to cover multivariate 
analogues of Case i (§8.4i). Case 2 (§8.42) and Case 3 (§8.43). The essential point to be 
noted in all of these cases is that k represents the number of functionally Independent 
a^p (for each i) involved in specifying II and r (<k) represents the number of function- 
ally Independent a^^ (for each 1) Involved in specifying u>. 

11.8 Remarks on Multivariate Analysis of Variance Theory 

The application of normal linear regression theory to the various analysis of 
variance problems discussed in Chapter IX can be extended in every Instance to the case 
in which the dependent variable y is a vector of several components. In all such multi- 
variate extensions, U in §ii .7 plays the role in making significance tests €malogous to 

!/■— r» 

Snedecor*s P (or l/( to be more precise) in the single variable case, (Theorem 

(A), §8.3). 

The reader will note that the problem treated In § 11.5 la an exainple of multi- 
variate analysis of variance^ and Is the multivariate analogue of the problem treated In 

§ 9 . 1 . 

To Illustrate how the extension would be made In a randomized block layout 
with r' rows and s' columns, let us consider the case In which y has two components y^ 

and y^. Let and be the values of y,| and y^ corresponding to the 1-th row and 
j-th column of our layout, 1 = 1 , 2 , . . . ,r ' , j- 1 , 2 , . . . , 3 ' . 


according to a normal bivariate law with zero means and variance- covariance matrix 

a’’ a’2 
A^’ a22 

Now suppose we wish to test the hypothesis that the ’’column effects” are zero 
for both y^ and y^. This hypothesis may be specified as follows: 



The distribution assumption for the y-^ x 
p! a I y» » ' ^ J 

andly^ij-ra^-R^i-C^j), (ZR,! - Zc, j - gRgi - 


^y^ij Is that (yiij-^rRu-Cij) 
■“ 0) are jointly distributed 




^-00 < nig, C, j, Cg j < od , 

\ (1-1,2, ...,r*) IIA,, A,- 

) I ^ positive definite, 

V ( f 2 , . • » > S * ) # ''^21 ^22 

(a) > 

1 The subspace In il obtained by setting 
***' A each j and each Cgj - o. 

This hypothesis, which may be denoted by H((C^ j,Cgj)-o], Is clearly the 2 -varlable ana- 
logue of H[(Cj)- 0 ] In S 9 « 2 < 

y^j^ ^I.J' ^1 * ^IIR’ ^llC' ®nE “®anl”g® a® functions of y, , ^ , 
y,,2,...,y,j,,g, similar to those of y^ , yj> 7 , Sp, 3 ^,, Sg as functions of y, i,y,2» 
'**'^rs* ^ 21 .' ^ 2 .j' ^2 ' ^ 22 R* ^ 22 C' ^ 22 E Similar meanings as functions of 

^211' ^21 2' * * ‘ '^2r 's ' * 

Si 2 R-&yii.-yi)<y 2 i.-y 2 )> 

ly J 

(b) s,2c-^(yi.j-yi)(y2.j-y2)» 

^12E ” ^-i^^lij'^ll.'^l . j'*^^1 ^^^2lj~^2l.”^2. j''’^2^' 

It may be verified that the likelihood ratio A for testing the hypothesis 
H[(Ci j,C2j)=0] Is given by where 


^11E 

®12E 

®12E 

^22E 


^llE'^^IIC ^12E'^^12C 
®21E'*’^21C ^22E‘^^22C 


It follows from Theorem (D), §ii.7, that the h-th moment of when 
HI(C ,,C j)-0) Is true, the special case of (x), §11.7, obtained by setting a»2, k-r'+s'-l, 

1 j 2j 

r - r', n - r's', 1. e. 


(d) E(l^) 


^ r«(8'-1) )P( r'(s^-i )-l )p( (r'-1 H3 '-i 


[r'-l )(3 '-1 )-i 
2 




using fonnula (r) In §ii.5, this reduces to 

(e) E(i^) - 


E(u^) . I (r'(3'-l )-l )| ((r'-l )(3'-l )-i+2h) 
° r(r'(s'-1)-l+2h)n(r'-l)(3'-l)-l) 
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from which we can easily obtain the distribution of by the method used In ‘deriving the 
distribution of Z In (u), 511.5. 

The extension of the hypothesis specified In (a) and the corresponding to 
the case In which y has several components, say y^^y^^.-.^yg Is Immediate, Similar re- 
sults hold for testing the hypothesis that "row effects" are zero. 

We cannot go Into further details. The Illustration given above will perhaps 
Indicate how Theorem (D) can be used as a basis of significance tests for multivariate 
analysis of variance problems arising In three-way layouts, Latin squares, Graeco-latln 
squares, etc. 

n.9 Principal Components of a Total V^arlance 

Suppose ,x^, , . , ,x^ are distributed according to the normal multivariate* law 
(a) In §11.3. The probability density Is constant on each member of the family or nest 
of k-dlmenslonal ellipsoids 

k 

(a) “ c, 

where 0 < C < oo . The ellipsoids in this family all have the same center (a^ ,a 2 , . . . ,aj^) 
and are similarly situated with respect to their principal axes , that Is, their longest 
axis lie on the same line, their second longest axis lie on the same line, etc., (assuming 
each has a longest, second longest, ..., axis). 

Our problem here is to determine the directions of the various principal axes, 
and the relative lengths of the principal axes for any given ellipsoid in the family (the 
ratios of lengths are. In fact, the same for each member of the family). We must first 
define analytically what is meaxit by principal eixes. For convenience, we make the follow- 
ing translation of coordinates 

“ y^ 1=1, 2,..., k. 

The equation (a) now becomes 


The theory of principal axes and principal components as discussed in this section (in- 
cluding no sampling theory) can be carried .throu^ formally without assuming that the 
random variables ,Xg, . . .,Xj^ are distributed according to a nonnal multivariate law. 
However, this law is of sufficient Interest to justify our use of it throu^out the sec- 
tion. Some sampling theory of principal components under the assvmgjtlon of normality 
will be presented in § 11 . 11 . 
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If P:(7^ pepresents an 7 point on this ellipsoid, then the squared distance 

between P and the center 0 Is ±,?. Now if we allow P to move oontlnuouely over the 

1 ^ 

ellipsoid, there will, in general, be 2k points at vdiich the rate of ohange of with 
respect to the coordinates of P will be zero, i. e. there are 2k extrema for \mder 
these conditions. These points occur in pairs, the points in each pair being synmetric* 
ally located with respect to the center. The k line se^^nts connecting the points in 
each pair are called principal axes . In the case of two variables, i. e. k - 2, our el* 
lipsoids are simply ellipses, and the principal axes are the major and minor axes. We 
shall determine the points in the k*variate case and show that the principal axes are 
mutually perpendicular. 

It follows from 5^.7 that the problem of finding the extrema of for varia- 
tions of P over (c) is equivalent to finding unrestricted extrema of the function 


(d) 4) - 
for variations of the y^ and A. 

(e) 


^jI + A(C-^:^^Aij7i7j) 

Following the Lagrange method, we must have 

ma 0, (i»1 ,2, • • • ,k) 


and also equation (c) satisfied. Performing the differentiations in (e), we obtain the 
following equations 


(f) 

Suppose we multiply the 
have 



i*th equation by A^, 




1 - 1 , 8 , 


0 , 


( l“i ,2,...,k)» 


,k, and sum with respect to i. We 


(g) 


Since > 


Ih 


(h) 


if 


- 0- 

h, and o, if j h, (g) reduces so that it ina 7 bo written as 


Allowing j to take values l,2,...,k. It Is now clear that equations (h) are equivalent 
to (f ) for finding the extrema. In order that (h) have solutions other than soro. It is 
necessary for 


(!) 


< 

1 

a’2 .... 

... a’*^ 

cvi . • 

< 

A^^-A . . . . 

• • • «\ 

A*’ 

A • • • 


ikk. 

• e • e/\ 






Multiplying the g-th equation by Cgj euid using the fact that °gl*^gj " ^ij ^ 
of mutually orthogonal vectors) and summing with respect to g, we find 


'i-t 




Substituting in the equation of the ellipsoid (c), we have 




Now it follows from the argument leading to (n) that > - 0, (if and 

i y J" 1 

(if ©"h). Hence the equation of the ellipsoid in the new coordinates is 

w 

I!-" 

The Jacobian of the transformation (p) is ICg^l which has the value 1, as one can see by 
squaring the determinant. Hence, if the (Xj_-a^) are distributed according to the normal 
multivariate law (a), 51 K 3 | and since (p) transforms the quadratic form (a) into (r), 
then the z are independently distributed with variance A . But from (o) we also have, 

o & 

by taking variances of both sides, 


ifpi gj g 

Suxnning with respect to g, and using the fact that ± V'gJ ■ ‘IJ' 


-±. 


In other words the s\am of the variances of the y^ (l»i ,2, . . . ,k) is equal to the sum of 
the variances of the (g-l ,2, . . . ,k). a^,a^,,.,,a^ are called principal components of 

*the total variance. It will be observed that z is constant on ^a-i)-dlmenslonal planes 

o 

perpendicular to the a-th principal axis, gfe*i,2,...,k. 


We may summarize in the following 

^eoremj^: Let y^ . . . ,yj^ ^ random variables distributed 
normal multivariate distribution 


to the 


Va / 


dy, ...dy^. 
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Let the roote of the charactepletlc equation > o a , . . . .Aj^. Let Cg^ 

(1-1.2 k) be the direction coalnea of the g-th principal axis of 


(t) 

and let 

(u) 




C, 



^g' 




Then 

( 1 ) The direction cosines are given by 



where the aatlafy the eguatlona 



( 1 — 1 f 2 f flC ) m 


(2) The length of half the g-th principal axis la Va C. 

o 

(3) The principal axea are mutually perpendicular . 

(4) The tranaformatlon (u) tranaforma the probability element (a) Into 



the z being Independently distributed, 

11 

(5) > A « aum of the variances of the y., is equal to the sum of 

the variances of the Zg. 

If two of the roots of (1) are eqioal, we would have an Indetermlnant situation 
with reference to two of the principal axea. In this case, there will be a two-dlraen- 
alonal apace, 1. e. plane, perpendicular to each of the remaining principal axea such 
that the Intersection of this plane with (c) la a circle. Similar remarics can be made 
about higher multiplicities of roots. 

As a simple exan^le In multiplicity of roots, the reader will find It Instruc- 
tive to consider the case In which the variance of y^^ (1=1 ,2, . . . ,k) la cr^ and the covari- 
ance between and y^ la Equation (1) becomes 

[ar^(1-p)-A]^"'‘[o-^(1+(k-1)p)-A] » 0. 

There are roots of two magnitudes, one being with multiplicity k-l; the other 

being <r^(1+(k-i )p) with multiplicity 1. It la convenient In this case to think of one 
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long principal axia (if ^>0) and k-l abort ones all equal (although indeterminate in 
direction). If then it ia clear ttet the long ajcia increases as k increases, idiile 
the short axes remain the same. Thus the variance of the z (which is a linear function 
of the by transformation (u)) corresponding to the longest axis Increases with k. 

This property of increasing variance of the linear fimctlon of several positively inter- 
correlated variables associated with the longest axis, la fundamental in the scientific 
construction of examlmtlona, certain kinds of indices, etc. By continuity considera- 
tions one can verify tliat the property holds, roughly speaking, even when the variances 
(as well as the covariances) of the variables depart slightly from each other. 

11.10 Canonical Correlation Theory 

Let x^ ,X2, • . • ,Xj^ be random variables divided in two seta 3^ : (x^ ,X2, . . . ,x^) 
and 3g:(Xj^ + +k ^ (k^+k^-k). We shall assume that k^ ^ kg. Let L^ and Lg be 

arbitrary linear functions of the two groups of variates, respectively, 1. e. 


(a) 




The correlation coefficient between L^ and Lg (see 52 •75) is given by 


'12 


A^Pl 1 
^ •^Il'^2p 






where i and J in the summations range over the values i,2,...,k, while p and q range over 
the values k^+i, k^-i-2,...,k^+k2. IIA^^I is the covariance matrix between the variables 
in and those in Og ; I is the variance- coveriance matrix for variables in G^; a 

similar meaning holding for I lA^^I I . 

Now suppose we consider the problem* of varying the 1^^ and Igp so as to maxi- 
mize the correlation coefficient R^g, (actually to find extrema of R<,g, among which there 
will be a maximum). Corresponding to any given solution of this problem say 1*^, 1*^, 
(i-i,2,...,k^; p-k^ +1 ,,..,k^+kg) there are infinitely many solutions of the form al*j^# 
bl^, idxere a cuid b are any two constants of the same sign. To overcome this difficulty, 
it is sufficient to seek a solution for fixed values of the variances of L^ and Lg, wkich. 


*Thi8 problem was first considered by'H. Hobelllnft ’•Relations Between Two Sets of Variates" 
Biomet rlka . Vol. 28 (1936), pp. 322-377. 



for convenience we may take as 1 . This Is equivalent to the determination of the extrema 
of R ,2 for variations of the and Igp, subject to the conditions 


( c ) Ji, ii, J - . . - ' ■ 

By Lagrange’s method this amounts to finding the extrema of the function 

(d) * . 

where A and m are divided by 2 for convenience. The and Igp must satisfy the equa- 
tions 




di,i 


2p 


which are 

(g) 




1“ 

p*k^ •f 1 ^ 

1—1 ,2, • . . , 

p*lc i|+l|*##>k# 


Multiplying (g) hy 1^^ and summing with respect to (1), then multiplying (h) by l^p, 
summing with respect to p, and using (c), we obtain 

(1) A.^.gA‘Pl,iljp. 

Therefore putting jx - A In (h), we obtain a system of k linear homogeneous equations in 
the 1^^ and l^p. In order to have a solution not Identically zero, the k-th order deter- 
minant of the equations (g) and (h) must vanish. That Is 


■H 

< 

< 

1 

j 

aPJ 

1 

> 


If we factor Va^ out of the 1-th row and j-th column (l-i ,2, . . .,k) and out of the 

p-th row and p-th column (i>»k.|+l,...,k^+k 2 ), we find that (j) is equivalent to 


■^ij 

I PlQ 

Ppj 
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where the p's are correlation coefficients, and - Ppp - i. it can be shown* that the 

roots of (k) are all real, since the determinant (k) Is the discriminant of Q, - 

where Qg is the sum of the two quadratic forms In (c), and hence la positive definite. 

If the determinant in (k) la expanded by Laplace's method by the first k, columns (or 

rows) It Is clear from the resulting expansion that (k) Is a polynomial of degree k, 4 k„ 

In which the lowest power of A la . Hence by factoring out A ^ we are left with 

a polynomial f(A) In A of degree 2k^ • Now any term In the Laplace expansion of (k) (by 

the first k^ columns) la the product of a determinant of 9 rder k^ and one of order kg. 

If the first determinant has r rows chosen from the upper left hand block of (k), then 

the second determinant will contain - (k--r) rows from the lower left hand block of 

k — k +2r 

(k). The product of these two determinants will therefore have A ^ ^ as a factor. 

k "*k 

Therefore, by factoring A ® ^ from each tenn In the Laplace expansion of (k), it is clear 

that the resulting polynomial, that Is f(A), will contain only even powers of A. There- 
fore, the 2 k^ roots of .f(A) - 0 are real and of the fonn +A^, +Ag,...,+Aj^ , idiere each A 
is ^ 0. Let Aj^ - and -Aj^ - 11[2l-i)* ^uil' ^u2p' solutions of the equations 

(g) and (h) corresponding to the root (u-i ,a, . . . , 2 k, ) and lot 1^, be the values 

of L, and Lg In (a) corresponding to the solutions 1^,^, Remembering that ja - A, 

and Inserting the u-th root in (g) and (h), we must have 

< ^ J ^ A^’^^u2p ■ 1-1,2,... ,k, , 


"•'uap - 

^^ull “ R[u)^^^^^u2q 


I^k, 4- 1 , . . . ,k, +kg« 


Multiplying (1) by 1^,^^ and summing with respect to 1, and making use of the fact that 

A'l - 


^U2P^111 “ P(U)” 

The first term In (n) Is simply the correlation coefficient between and and its 
value Is If u Is even, then the correlation between and Is equal to that be- 
tween Lj^^^ and j 2 (or and L^^^^ j^). It can be easily verified that the 

correlation between L( 2 i)i ^(2j)2 zero. Hotelling has called and 1^^ 

the u-th canonical variates, and the canonical correlation coefficient between the 
canonical variates ly.j and l^j^* Hence, the canonical correlations and therefore the roots 


M. Bdcher, loc. clt., p. 170 . 
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of the equation (k) lie on the Interval (-l,+l). If there exists a single largest root. 

It Is the one such that when It Is substituted In (g) and (h) we obtain solutions (1. e. 

values of 1,^^^ and l 2 p)» which, used In (a), will give the linear functions having maximum 

correlation. For further details on canonical correlation theory, the reader la referred 
to Hotelling's paper. 

We may summarize our results in the following 

The^^_^^: Let :(x,,Xg,...,Xj^ ) and 32:(xjj. ^^,...,Xjj) he two sets of ran- 
dom variables where k - k^+kg (k^^g). l£[t L, and Lg, m defined to (a), jge linear func - 

tions of ^ variables to 3^ and Sg, respectively , such that the variances of L, and Lg 
are unity . Let M the correlation coefficient between L^ and Lg. Then 

( ' ) There are at most 2k^ distinct extrema of 2 for variations of the 1^ and Ig^^ 
to L^ and Lg; 

(2) These extrema correspond to the 2k^ roots of equation (k), which lie on the 
Interval (-1 ,+l ) and are symmetrically spaced with respect to the origin . 

( 3 ) The value of R, g corresponding to the u-th root of (k) to equal to value to 

Itself (the u-th canonical correlation coefficient ) . 

(4) The canonical correlation coefficient between the two canonical variates corres - 
ponding to two numerically different values of is zero . 

The reader should note that no assumptions have been made about the distribution 
function of the two sets of random variables, 3^ and 3g. We are able to maintain this 
degree of generality as long as we are considering canonical correlation theory of popu- 
lations. However, the statistical value of this theory may be questionable If the distri- 
butions of the x's to and Gg departs radically from the normal multivariate law. Again 
to studying sampling theory of canonical correlations, progress has been made only for the 
case of sampling from nonnal multivariate populations. Some of the sampling results are 
given to $11.11. 

11.11 The 3afflDltog Theory of the Roots of Certain Determlnantal Equations 
In the treatment of the theory of principal con^jonents (§11 .9) and of the canon- 
ical correlation theory ($11.10), It was found that the roots of certain detenatoantal 
equations to which the matrices are variance- covariance matrices, played fundamental roles. 
In testing hypotheses concerning principal components, canonical correlations and allied 
topics, we are interested to the roots of the analogous equations to which the matrices 
are saijiple variance- covariance matrices. In the following sections, we shall derive the 




distributions of the roots of several saaqple detemlnantal equations* when the samples 
are drawn from certain special multivariate normal populations. The distribution theory 
of the TOots for more general assumptions has not yet been developed. 

11.111 Characteristic Roots of One Sample Variance- covariance Matrix . 

Let us consider a sample l-i, 2 ,...,k; a-l, 2 ,...,n) from a normal multi- 

variate population, whose variance- covariance matrix has one root A of multiplicity k. 

The variance- covariance matrix of the population Is then of the form 



and Its Inverse Is 




* These distributions and their derivations were first published In the papers by R. A. 
Plaher, "The Sampling Distribution of Some Statistics Obtained from Non-llnear Eqvia - 
tlons", Aimala of Eugenics . Vol. 9 (1939) pp. 258-2l»9, and by P. L. Hsu, "On the Dis- 
tribution of Roots of Certain Determlnantal Eqioatlons", Annals of Eugenics . Vol. 9 
(1939)» PP- 250 - 258 . The derivations used In this section were developed by A. M. 

Mood (unpublished). 






262 


XI. AW INTRQDaCTION TO MOUIVARIATE aTATiaTlCAL AMAiaSia 


- 111 


which l8 amlogoua to (1) $11.9. For a geometrical Interpretation of these roots, the 
reader la referred to §11.9* 


In $11.9, It was shown that for a matrix there Is a set of nxmbers 

Cgi (l,^i, 2 ,...,k) (direction cosines of the principal axes of the family of ellipsoids 

such that the transformation 

^^CgiYi - Zg (£p.1,2,...,k) 


will yield 
(c) 




where the are roots of |A^^~A4^j| - o. .Expressing the In terms of the In the 
middle member of (c), we get 


^lAi^y^y . - ^ 

iJ-1 ^ ^ l,j“1 g-1 ''g ^ ^ 


Hence, 


^ g=i ''g 


( ly j"*1 ,2,»*syk)s 

In a similar msuiner we can find numbers -yjj^ (l,h-l ,2, . . . ,k) to express a^j as 


(d) 


hi - ^^ih^jh^h^ 


(l,j*“i,2,*##,k) 


where the 1^^ are the roots of (b) and the ^Ih are elements of an orthogonal matrix 
I I ; that Is and “ **1 j* \ "^ih depend only on 

the We can get the simultaneous distribution of Ij^ and by substituting (d) In 

*n-i k^^lj'X^lj^ multiplying by the Jacobian of the transformation (d). Ordering the 
1^ so that li ^ Ig ^ ^ ^ Oj the Jacobian Is 


(e) (l,-l 2 )(l,-l 5 )...(l,-lj^)(l 2 -l 3 )...(ll^.,-lj^)(t)(yjj^), 

where ® fur^ction of the only, not Involving 1^^. This can be verified In 

the following way. It is clear from (d) that the Jacobian will be a polynomial In the 
Ij^; In fact it will be a polynomial of degree for thei>e are ^ Independent 

elements in I l7jjl I • If 1^ - Ij the transformation (d) will not be uniquely de- 

termined, and hence the Jacobian will be zero since when a tranafoimation la not 
(locally) xmlque, the Jacobian Is zero. This fact Implies that we can factor out terms 
(lj_-lj) (ly^j). There are such terms, and when they have been factored out, what 


remains la Independent of the since the Jacobian is a pol^momlal of degree 
Noting that 


'®lj' “ 


and that ^a,, - ^ 

the 7j^ as 


IhVjh' “ Vjh' “ '^Ig' ‘ 'V^h' • '"^jh' “ 

- ^ ^ '*lh^h**lh " write the distribution of the Ij^ and 


k ■ 2A^^1. 


<nii) 




tn-1 JK KIK-U , 

( 2 A) 2 nr(¥: 


4>(7ij). 


To derive the distribution of the alone, we Integrate (f ) with respect to the 7^j 

over the apace of the for which I Is orthogonal, obtaining 


k Sikll - 


The constant K Is determined by the condition thnt the integral of (g) over 


the space R of the Ij^ Is unity. To find K, let us first define 




)" e’ 


oHdi,. 

i<j ■' 1=1 


We note that 


Since 


f^ll - la^jl » we have 




x,n-k-2 

„ oC — 3 — + s) 


but from the Wishart distribution (see (h) In §ir.it), we find that 

9 ksA-TC^ + a) 

(k) • 



Equating (J) to (k) and setting n we get 




Substituting In (g), we finally obtain as the distribution element of the characteristic 
roots of (b; 



It can be shown fairly readily by making appropriate orthogonal transformations, 
that If the sample ai-i , 2 , . . . ,n) la from the normal multivariate 

population (a) In §11.3 for the case In which the characteristic roots of the matrix 



a 0 are all equal to A, aay, then the characteristic roots of (b") are also 
distributed according to (m). 


We may summarize In the following 


Theorem (A) : Let 1-1, 2, a-i,2,...,n) 


from a normal 


multivariate 


iSnSlHiJ 


on for which the characteristic roots of the variance- covariance 


matrix ai>e ecual to A. Let aij ,k ) ^ the second order sample product sums 

as defined below (a). I^t ^ the roots ( In descending order of magnitude ) 

of - 0. The .joint probability element of the Ij^ (1-1 ,2, . . .,k) Is given by (m) . 

11.112 Characteristic Roots of the Difference of Two Sample Variance- covariance 
Matrices . 

1 P 

Let us consider two samples 0^^ l“i>2,...,k; o-i ,2, . . . ,n^ ) and 0^^ 

1-1, 2,,.., k; 0-1, 2,..., n^) (n.|>k, n 2 >k) drawn from the same noirnl multivariate popula- 


Ya - 2.^,^j(x^-a^)(Xj-aj) 

(a) jr75-e 

Let - ^^(xJg,-xJ)(x^Xj), (t-l,2,). In this section, we shall derive the distribu- 
tion of the roots of 

(b) ~ ®lj* “ 

In §11.9, we have seen that there is a linear transformation 


‘1 - >1 - h 


C Z 

g 


( I** 1y2|»e«|lc) 


such that 




Now let 


an a W . 
Ag g' 


( y 2y • • • |k)^ 


'i ■ ®i “ ^Vgi"g» 


( l."1 I 2^ e e • |lC ) • 


Then 


The tranaformatlon (e) when performed on the sample values gives us 


(t-1,2) 


where 


‘^gl " "^"gl- 


Now equation (b) becomes 


e(a]^j+a^j)-a^j| - j+^1 jH)i jl^j 


1 .-^2 V V .2 


- ">gii ■ • I'Jhji - “• 


Clearly the roots of 




are the same as those of equation (b). Note that the b^j are functions of the w^^, such 
that each value of 1 , t, and < 3 t, w^^ Is distributed according to a normal law with zero 
mean euid unit variance, the w^^ being independently distributed. This shows that we lose 
no generality by assuming that = 1 , If l=j, eind « 0 If 1 j. 

Under this assimiptlon, the a^j have the distribution 






e 


(t- 1 , 2 ). 




Prom a theorem In algebra we know that there Is a transformation of the a^j such that 
( 1 , J" l, 2 ,...,k) 


where ^ are the roots of (b) (arranged, say. In descending order of magnitude)* The u^ 


See M. Bocher, loc. clt. p. 171. 




eu3d the are fmQtlons of ajj and a®j; hence, their dlatrlhutlon may be found by sub- 
stituting In 


and multiplying by the Jacobian of the treuiaformatlon (h). By following a procedure 
similar to that of §1 1 . 111 , we can show that the Jacobian of (h) I 3 




where 4^(u^j) la a function of the u^j Independent of e^. Hence, the simultaneous distri- 
bution of the and u^ ^ Is 


iXu., u 


ih^jh -^,''ihVjh 




(n^-1) k(k;-i 






is 


Crig^rkllc-i 




fe(V«j)1'(Ulj)- 


i<j»i 


Noting that 'g;uy^(i-«h)'^jh' " ''^Ih' ' ’ '""jh' igJu^hVjh' = '’^Ih' ' 

*^^gh^ * factors Into a function of the and a function of the 


(k) C 


A. 


1-1 ^ 

h-1 ^ 


n2-k-2 


ri^v®j) ’ i^ij 


1 2 

o ^ U. . 






where C Is a constant. On Integrating with respect to the u^j we get the marginal distri- 
bution of the e. 


n^-K-2 n2-K-2 

tfrc-ej) " fr®! " n(«r«t)- 

1-1 ^ 1-1 ^ 1<J ^ ^ 


K la a constant to be determined so the Integral of (1) over the range of the e^ Is unity. 
Following a procedure similar to that used In determining K In §ii.iii, we evaluate K In 
(1) and obtain as the distribution element of the 6^ 



n.+n„-l-i 


(m) ' J.t T^(l-«i) fr (V«i)Aa«,. 

1-1 p“8 1-1 1 1-1 1 i<Ui 1 j 1-1 i 


n.,-k-2 n--k-2 




n il- -a. I— t ii^-x I— 

4—) r (-§— )i (—^) 


It should be emphasized that distribution (m) holds for the roots of (b) where 
1 2 

the a^j and the a^j are any two sets of random variables distributed independently ac- 
cording to the Wishart distributions 




where and are both > k. In fact, we may summarize our results In 

Lot aj^j and a|j ^ Independent ly distributed according to the 
Wishart distributions (n). Let . . ,,6j^ ( In descending order ) be the roots of the 

equation l^(®ij+®'ij)"®ijl ” 0* Then the .joint probability element of the (1=1 ,2, .. . .k) 
la given by (m), where the range of the e's le 

11.11^ Distribution of the Sample Canonical Correlations . 

Corresponding to the population canonical correlations discussed in §ll.io, 
there are canonical correlations of a sample. In this section, we shall determine the 
distribution of the saniple canonical correlations when the smaller set of variates has a 
normal multivariate distribution Independent of the other set. 

Consider a sample u-1,2, ...,k,+k 2 ; o-i,2,...,n) from a population where 

the first k^ variates have a normal distribution and the remaining kg variates are dis- 
tributed Independently of the first k^ (k^^g). Let 


Hx -X ), 

u vo- V 


(u,v=l,2,...;kpkg). 


The canonical correlations of the sample are defined as the roots of 


;^®lj 

I 


( i» j“i ,2,»..,k.j, 
p,q-k,+l,...,kpk2). 


Multiplying each of the first k^ columns by 1 and then factoring 1 out of each of the last 
kg rows, we see that (a) is equivalent to 

I i "iq I 


®PJ ! "?pq 



Slum 


XI. AN IMTRODIICTION TO MUl/PIVARIATE STATISTICAL AMALY3I3 


g6q 


kg-lc. 

except for a factor of 1 • Since we are not Interested in the roots which are 

Identically zero, we shall confine our attention to the roots of (b). 

Let a^^ be the element corresponding to ap^ In the Inverse of 
(p,q-k^+i , . . . ,k^+k 2 ). After multiplication on the left by 

(c) 


^1 

i k,+k„ 

0 

P 

CO 


'V" 


equation (b) becomes 


(d) 



k^ *^1+^2 

^ a. a^a 


k^ 

- '^a^Pa„„ 


0 , 


( i,2,«**,k^^ r, 3“k i|4‘l,**»,k^ +k g ) • 

Since each member In the upper right hand block Is 0 and since each element In the lower 
rl^t hand block Is (d) can be reduced to 


(e) 


'*'^2 

p,]>=»k,+l PJ 


The roots of (e) (which are also roots of (b)) are the sample canonical correlation co- 
efficients. Let the squared roots (in descending order) be • We observe 
that 


(f) 


k^ -t-kg 


p,j>«k^+i 




^r 


"pj 


pr 


"pj 


pr 


where. In the determinants on the right, h and j are fixed but p,r « k^ +1 , . . . ,k^ +k 2 . 

Let this value be hj^j* If we consider the Xp^^ (a« i,2,...,n; p « k^ +1 , . . . ,k^ •fk 2 ) 
fixed with I l®-pql I positive definite, then a^^j and b^j are bilinear forms in x^^, and 
Xjp (a,P « 1,2,...,n; 1, j = 1 ,2, . . . ,k^ ); 1. e., a^j may be written as 
where IlG^j^pll la of rank n-1, and may be written aa IIHafjll la 

of rank k^, being a function of the fixed Xp^. and b^j are, therefore, bilinear 

forms In the x^^ €uid Xj^ having matrices which do not depend on 1 and j. By Cochran’s 




Theorem, we know that there la a tranaformatlon which applied to the would make 



Applying this same transformation to each set we get 

*'ij -iJvK 

The y's ere normally and Independently distributed with zero means and equal variances. 
Thus (e) may be written In the form 

where c^j •• the c^^j and b^^j being independently distributed according to Wlshart 

distributions w^_j^ \ ,k where Is some positive 

definite matrix. 

Therefore, It follows from the results of §11.113 that the square of the roots 
of (e) (1. e. the square of the canonical correlation coefficients) have the distribution 
(m) where n, - n-k^, k=k^ and n^ - k^+i . That Is, the distribution Is 



where 6^^ = 1^, 1=1 ,2, . . . ,k^ . 

We may svmmarlze our results In 

Theorem (A) : 1 ^ 0 ^: (x^^; 1 - 1 ,2, . . . ,k^ +k^; k^^„; a-1 .2.....n) be a sample 

from a population In which the first k, variates are distributed according to a normal 
multivariate distribution , but Independently of the remaining variates ( vdilch may have 
aay arbitrary distribution or maj; M " fixed " variates ) . Let a^^ (u,v=i,2,...,k^+kg) be 

the second order sample product sums as defined above (a), and let 1^ be the 

squared roots ( squared canonical correlation coefficients ) of equation (b). The . 1 olpt 
distribution element of ^ 1^, (l=i,2 ,...,k^ ) Is given ^ (g) where « 1^, ^ where 

the range of the 1^'s such that ^0. 

^1 
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Analysis of covariance, 195 

extension to several fixed 
variates, 199 

Analysis of covariance table, 198 
Analysis of variance, 176 

for Incomplete lay-outs, 195 
for Graeco-Latin square, 192 
for Latin square, 189 
for randomized blocks, 180 
for two-way layout, 1 80 
for three-way layout , 1 86 
multivariate extension of, 250 
Average outgoing quality limit, 225 
Average quality protection, 223 
Beta function, 75 
Binomial distribution, kj 
Bernoulli case, 49 
moment generating fimctlon of, 48 
negative, 56 
Poisson case, 49 

Binomial population, confidence limits 
of p in large samples from, 129 

Borel-measurable point set, io 

Canonical correlation coefficient, 259 

Canonical correlation coefficients, 
distribution of. In samples, 270 

Canonical variate, 259 

C. d. f, (c\amulatlve distribution 
function), 5 

Central limit theorem, 81 

Characteristic equation of a variance- 
covariance matrix, 254 

Characteristic function, 82 
Characteristic roots 

of difference of two sample variance- 
covariance matrices, distribution 
of, 268 

of sample variance- covariance matrix, 
distribution of, 264 

Chi square distribution, i 02 

moment generating function of, 74 

moments of, 105 

reproductive property of, 105 


Chi-square problem, Pearson’s original, 21 7 
Cochran’s Theorem, 107 
Complete additivity, law of, 6 

Component quadratic forms, resolving 
quadratic Into, 168 

Conditional probability, 15 
Conditional probability density function, 17 
for normal bivariate distribution, 62 
for normal multivariate distribution, 71 
Confidence coefficient, 124 
Confidence Interval, 124 
Confidernce limits, 124 
from large samples , 127 
graphical representation of, 126 

of difference between means of two normal 
populations with same variance, 130 

of mean of normal population, 150 

of p In large samples from binomial 
population, 1 29 

of range of rectangular population, 123 

of ratio of variances of two normal 
population, 131 

of regression coefficients, 1 59 
of variance of normal population, 131 
Confidence region, 13? 

Confounding, 1 86 

Contagious distribution function, 55 
Consistency of estimate, 133 
Consumer’s risk, 222 
Contingency table, 214 

Chi-square test of Independence In, 216 

likelihood ratio test for Independence 
in, 220 

Continuous distribution function, 
bivariate case, io 

univariate case, 8 

Convergence, stochastic, 81 

Correlation coefficient, 32 

between two linear functions of random 
variables, 3^ 

canonical, 260 

canonical distribution of, 270 
distribution of, 120 
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Correlation coefficient (con*t) 
multiple, 45 

multiple, dlstrltutlon of. In samples 
from normal multivariate popula- 
tion, 244 

partial, 42 
Covariance, 52 
analysis of, 195 

between two linear functions of 
random variables, 54 

Critical region of a statistical test, 152 
Cumulative distribution function, 
bivariate case, 8 
k- variate case, 11 
continuous case, 10 
continuous, univariate case, 8 
discrete, bivariate case, 10 
en^Dlrlcal, 2 
mixed case, 11 

postulates for, bivariate case, 9 
postulates for, k-varlate case, 12 
postulates for, univariate case, 5 
univariate case, 5 
Curve fitting, 

by maximum likelihood, 145 
by moments, 145 
Curvilinear regression, 1 66 

Difference between two sample means, 
distribution of, 100 

Difference of point sets, 5 
Discrete distribution function, 
bivariate case, 10 
univariate case, 7 
Disjoint point sets, 5 
Distribution function, 

binomial, 4 ? * 

contagious, 55 

cumulative, bivariate case, 8 
cumulative, univariate case, 5 
dJLacrete, univariate case, 7 

limiting, of maximum likelihood esti- 
mates In large samples, 158 

marginal, 12 
multinomial, 51 
negative binomial, 54 
•normal bivariate, 59 
normal multivariate, 65 


Distribution function (con*t) 
normal or Gaussian, 58 

of canonical correlation coefficients, 270 

of characteristic roots of difference of 
sample variance -covariance matrices, 268 

of characteristic roots of sample 
variance- covariance matrix, 264 

of correlation coefficient, 120 

of difference between means of two samples 
from a normal population, 100 

of exponent In normal multivariate popula- 
tion, io 4 

of Fisher's z, 115 

of Hotelling's generalized "Student" ^ 
ratio, 25B 

of largest variate in sample, 91 

of likelihood ratio for generalized "Student 
statistical hypothesis, 238 

of linear function of noniially distributed 
variables, 99 .vX* 

of means in samples from a normal bivariate 
population, 100, loix/ 

of means In samples from a normal multi- 
variate population, 101 

of median of sample, 91 

of multiple correlation coefficient In 
samples from normal multivariate popula- 
tion, 244 

of number of correct matchings in random 
matching, 21 0 

of number of trials required to obtain a 
given number of "successes", 55 

of order statistics, 90 

of range of sample, 9f ^ 

of regression coefficients, k fixed variates 
162 

of regression coefficients, one fixed 
variate, 159 

of runs, 201 

of sample mean, limiting, in large samples, 
81 

of second order sample moments In samples 
from normal bivariate population, 1 1 6 

of smallest variate In sample, 91 

of Snedecor's F ratio, 114 

of "Student's" ratio, 110 

of sums of squares In samples from normal 
population, 1 02 

of total number of runs, 205 
Poisson, 5^ 

Polya- Eggenbe rger , 5 6 
Type I, 76 
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Distribution function (con*t) 

Type III, 72 
Wlahart, 120 

Wlshart, geometric derivation of, 227 

Distribution functions, Pearson system 
of, 72 

Efficiency of estimates, 15^ 

Equality of means, 

of normal populations, test of, 176 

teat for, In normal multivariate popu- 
lation, 238 

Estimation, 

by Intervals, 122 
by points, 135 
Estimates, 

consistency of, 133 
efficiency of, 13^ 
maximum likelihood, 136 
optimum, 133 
sufficiency of, 135 
unbiased, 133 
Expected value, 28 
Factorial moments, 204 
P distribution, Snedecor’s, 114 
Fiducial limits, 126 
Finite population, sampling from, 83 
Fisher’s z distribution, 115 
Fixed Variate, 16 
Gamma function, 73 
Gaussian distribution function, 56 
Generalized sum of sqiaares, 229 
Graeco-Latin square, 190 
analysis of variance for, 192 
Gram-Charller series, 76 
Grouping, corrections for, 94 
Harmonic analysis, 166 
Hermlte polynomials, 77 

Hotelling’s generalized •’Student" 
ratio, 258 

Incomplete lay-outs, 192 
Independence, 
linear, 1 60 

In probability sense, 13 

of mean and sum of squared deviations 
In samples from normal population, 108 

of means and second order sample moments 
In samples from normal bivariate 
population, 120 


Independence (con’t) 

of means and second order moments In 
samples from normal multivariate 
population, 120, 233 

of sets of variates, test for. In normal 
multivariate population, 242 

mutual, 14 

statistical, 13 

Inspection, sampling, 220 

Interaction, 

first order, i 8 l 

second order, 184 

Jacobian of a transformation, 

for k variables, 28 

for two variables, 25 

Joint moments of several random variables, 31 
Lagrange multipliers, 97 
Laplace transform, 38 
Large numbers, law of, 50 
Large samples, confidence limits from, 127 
largest variate In sample, distribution of, 91 
Latin square, 186 
analysis of variance for, 189 
complete set of, 191 
orthogonal, 190 
Law of complete additivity, 6 
law of large numbers, 30 
Least square regression function, 44 
variance about, 44 
Least squares, 43 
Likelihood, 136 
Likelihood ratio, 150 
Likelihood ratio test, 150 
in large samples, 151 

fpr equality of means In normal multivariate 
populations, 238 

for general linear regression statistical 
hypothesis for normal multivariate 
population, 247 

for general noiroal regression statistical 
hypothesis, 170 • 

for independence in contingency tables, 220 

for Independence of sets of variates In 
normal multivariate population, 242 

for "Student" hypothesis, 130 

for the statistical hypothesis that means 
In a normal multivariate population have 
specified values, 233 



Limiting form of cumulative dlatrlbu- 
tlon function as determined by 
limiting form of moment generating 
function, 38 

Linear combination of random variables, 
mean and variance of, 33 

Linear combinations of random variables, 
covariance and correlation coef- 
ficient between, 3^ 

Linear functions of normally distributed 
variables, distribution of, 99 

Linear independence, l 6 o 
Linear regression, 4 o 
generality of, 165 

Linear regression statistical hypothesis, 
likelihood ratio test for, in normal 
multivariate populations, 24 ? 

Lot quality protection, 223 
Marginal distribution fimctlon, 12 
Matching theory, 

for three or more decks of cards, 212 
for two decks of cards, 208 
Matrix, 63 

Maximum likelihood, curve fitting by, i 45 
Maximum likelihood estimate, 136 
Maximum likelihood estimates, 

distribution of, in large samples, 13B 
of transformed parameters, 139 

Mean of Independent random variables, 
moment generating function of, 82 

Mean value, 29 

of linear function of random 
variables, 35 

of sample mean, 80 
of sample variance, 83 
Means, 

distribution of difference between, in 
samples from normal population, 100 

distribution of, in samples from a normal 
bivariate population, 100 

distribution of, in samples from a normal 
multivariate population, 101 

Median of sample, distribution of, 91 

M. g. f. (moment generating function), 56 

Moment generating function, 36 

of binomial distribution, 48 

of Chi-square distribution, 74 

of mean of independent random 
variables, 82 

of multinomial distribution, 91 
of negative binomial distribution, 34 


Moment generating function (con»t) 
of normal bivariate distribution, 60 
of normal distribution, 57 
of normal multivariate distribution, 70 
of Poisson distribution, 53 

of second order moments in samples from a 
normal bivariate population, ii8 

Moment Problem, 35 
Moments, 

curve-fitting by, 1 45 
factorial, 204 

joint, of several random variables, 31 
of a random variable, 50 
Multinomial distribution, 51 

moment generating function of, 51 
Multiple correlation, 42 

Multiple correlation coefficient, distribution 
of, in samples from normal multivariate 
population, 244 

Negative binomial distribution, 54 
moment generating function of, 54 

Neyman- Pearson theory of statistical tests, 

152 

Normal bivariate distribution, 59 

conditional probability density function 
for, 62 

moment generating function of, 60 

regression function for, 62 

distribution of means in samples from, 101 

distribution of second order moments in 
samples from, 1 1 6 

independence of means and second order 
moments in samples from, 120 

Normal distribution, 56 
moment generating function of, 58 
reproductive property of, '98 

Normally distributed variables, dlstrlbutlor 
of linear function of, 99 

Normal multivariate distribution, 63 

conditional probability density function 
for, 71 

distribution of exponent in, 104 
distribution of subset of variables in, 68 
moment generating function of, 70 
regression function for, 71 
variance- covariance matrix of, 68 
Normal multivariate population, 

distribution of means in samples from, 101 

distribution of multiple correlation 
coefficient in samples from, 244 



Normal multivariate population (con*t) 

dlatrlbution of second order moments 
in samples from a, 232 

general linear regression statistical 
hypothesis for, 24 ? 

generalised ^Student” test for, 23h 

Independence of means and second 
order moments in samples from, 120, 233 

test for independence of sets of 
variables In, 2U2 

Normal multivariate populations, test 
for equality of means In, 238 

Normal population, 

distribution of means In samples 

from, 1 00 

distribution of sums of squares in 
samples frcmi, 102 

Independence of moan and sian of squared 
deviations In samples I'rom, 1 06 

Normal populations, 

distribution of difference between moans 
in samples from, 100 

test of equality oi’ means of several, 176 

Nonrial regression, i tM 

I’undamental theorem on testing hypothesis 
in, 170 

k fixed variates, 1 no 
one fixed variate, i ty 
Nu i 3 anc e pa ramo t. e i*s , 1 ^ 0 
Null hypothesis, Ity 
Optirrnim estlmte, ] y) 

Order statistics, 

distrihr.t ion the ry of, 89 

Ordering withlu samuK’s, tost, for randomness 
of, 207 

Parallelogram, aru'i ot, • » 

Parallel! )t» -pe , vn] ' jno ( it' , P28 
Partial ccri^el at Ic n , I'’ 
coefficient, V' 

P. d. f. (prob>aMl 1 ty density fujicti.jn ) , 8 
Peai'son system oi' dlst, rrnutlC'D '’uncti^-ns, ir 
Pearson’s urlginal idd-sanar’e pi'ut Icm, Ply 
Point set, 
difference, 
product, 9 
stim, 5 

Poisson dlstributinn, t'"' 

moment generating i'uiiction of, 23 


Polya- Eggenberger distribution, 55 

Population parameter, admissible set of 
values of, Uy 

Population parameters. 

Interval estimation of, 122 
point estimation of, 153 
Positive definite matrix, 63 
Positive definite quadratic form, 
k variables, 65 
two variables, 59 

Power curve of a statistical test, 154 
Power of a statistical test, 152 
Principal axes, 252 

direction cosines of, 25^^ 
relative lengtdis of, 256 
Principal components of a variance, 255 
Probability, condltlonai, 15 
Probability density function, 8 
bivariate case, ii 
conditional, ly 
Probability element, 8 
Probable error, 58 
Producer’s risk, 222 
Product of point sets, 5 
Quadratic form, 

positive dofiiiite, 59 

resolving, into component quadratic 
forms, U)8 

Quality control, statistical, 221 
Qinllty limit, average outgoing, 223 
Quality protection, 
average, 223 
lot, 22"; 

Rai idomi zed blocks, 1 y y 
Randomness, 2 

RandonTt:ios3 of ordering within 3ami)le3, 
test for, 20'( 

Random sample, definition of, 79 
Random variable, definition of, 6 
Range of sample, distribution of, 92 
Rectangular population, 

coui’ldence limits of range of, 125 
distribution of range In samples from a, 92 
Regi'esslon, 4 o 
Regression coefficient , 4 o 















